research-article

Active Learning for Classification with Maximum Model Change

Authors:
Wenbin Cai

Bing, Microsoft, Beijing, China

Bing, Microsoft, Beijing, China
View Profile

,
Yexun Zhang

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China

0000-0002-5390-9053
View Profile

,
Ya Zhang

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Siyuan Zhou

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Wenquan Wang

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Zhuoxiang Chen

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Chris Ding

University of Texas at Arlington, TX

University of Texas at Arlington, TX
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 36 Issue 2Article No.: 15pp 1–28https://doi.org/10.1145/3086820

Published:31 August 2017Publication History

ACM Transactions on Information Systems

Abstract

Most existing active learning studies focus on designing sample selection algorithms. However, several fundamental problems deserve investigation to provide deep insight into active learning. In this article, we conduct an in-depth investigation on active learning for classification from the perspective of model change. We derive a general active learning framework for classification called maximum model change (MMC), which aims at querying the influential examples. The model change is quantified as the difference between the model parameters before and after training with the expanded training set. Inspired by the stochastic gradient update rule, the gradient of the loss with respect to a given candidate example is adopted to approximate the model change. This framework is applied to two popular classifiers: support vector machines and logistic regression. We analyze the convergence property of MMC and theoretically justify it. We explore the connection between MMC and uncertainty-based sampling to provide a uniform view. In addition, we discuss its potential usability to other learning models and show its applicability in a wide range of applications. We validate the MMC strategy on two kinds of benchmark datasets, the UCI repository and ImageNet, and show that it outperforms many state-of-the-art methods.

References

N. Abe and H. Mamitsuka. 1998. Query learning strategies using boosting and bagging. In Proceedings of the 15th International Conference on Machine Learning (ICML’98). 1--10.Google ScholarDigital Library
M. Balcan, A. Beygelzimer, and J. Langford. 2006. Agnostic active learning. In Proceedings of the 23rd International Conference on Machine Learning (ICML’06). 65--72. Google ScholarDigital Library
L. Breiman. 2001. Random forests. Machine Learning Journal 45, 1, 5--32.Google ScholarDigital Library
K. Brinker. 2003. Incorporating diversity in active learning with support vector machines. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 59--66.Google Scholar
C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2, 121--167. Google ScholarDigital Library
W. Cai, M. Zhang, and Y. Zhang. 2015. Active learning for ranking with sample density. Information Retrieval Journal 18, 2, 123--144. Google ScholarDigital Library
W. Cai, Y. Zhang, and J. Zhou. 2013. Maximizing expected model change for active learning in regression. In Proceedings of the 13th International Conference on Data Mining (ICDM’13). 51--60. Google ScholarCross Ref
W. Cai, Y. Zhang, S. Zhou, W. Wang, C. Ding, and X. Gu. 2014. Active learning for support vector machines with maximum model change. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’14). 211--226. Google ScholarDigital Library
Y. B. Cao, J. Xu, T. Y. Liu, and H. Li. 2006. Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). 186--193. Google ScholarDigital Library
C. Chang and C. Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, Article No. 27. Google ScholarDigital Library
D. A. Chon, Z. Ghahramani, and M. I. Jordan. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 129--145.Google ScholarCross Ref
S. Dasgupta and D. Hsu. 2008. Hierarchical sampling for active learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 208--215. Google ScholarDigital Library
S. Dasgupta, D. Hsu, and C. Monteleoni. 2007. A general agnostic active learning algorithm. In Proceedings of Advances in Neural Information Processing Systems (NIPS’07). 353--360.Google Scholar
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255.Google Scholar
P. Donmez and J. G. Carbonell. 2008. Optimizing estimated loss reduction for active sampling in rank learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 248--255. Google ScholarCross Ref
K. Dwyer and R. Holte. 2007. Decision tree instability and active learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’07). 128--139. Google ScholarDigital Library
Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. 1997. Selective sampling using the query by committee algorithm. Machine Learning 28, 2, 133--168. Google ScholarDigital Library
A. Freytag, E. Rodner, and J. Denzler. 2014. Selecting influential examples: Active learning with expected model output changesIn Proceedings of the European Conference on Computer Vision (ECCV’14). 562--577.Google Scholar
Y. Fu, X. Zhu, and B. Li. 2013. A survey on instance selection for active learning. Knowledge and Information Systems 35, 2, 249--283. Google ScholarCross Ref
Y. Guo. 2010. Active instance sampling via matrix partition. In Proceedings of Advances in Neural Information Processing Systems (NIPS’10). 802--810.Google Scholar
Y. Guo and D. Schuurmans. 2007. Discriminative batch mode active learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’07). 593--600.Google Scholar
Steve Hanneke. 2007. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 353--360.Google ScholarDigital Library
S. C. Hoi, R. Jin, and M. R. Lyu. 2009a. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering 21, 9, 1233--1248. Google ScholarDigital Library
S. C. Hoi, R. Jin, J. Zhu, and M. Lyu. 2009b. Semisupervised SVM batch mode active learning with applications to image retrieval. ACM Transactions on Information Systems 27, 3, Article No. 16. Google ScholarDigital Library
T. M. Hospedales, S. Gong, and T. Xiang. 2013. Finding rare classes: Active learning with generative and discriminative models. IEEE Transactions on Knowledge and Data Engineering 25, 2, 374--386. Google ScholarDigital Library
S. Huang, R. Jin, and Z. Zhou. 2010. Active learning by querying informative and representative examples. In Proceedings of Advances in Neural Information Processing Systems (NIPS’10). 892--900.Google Scholar
S. Huang, R. Jin, and Z. Zhou. 2014a. Active learning by querying informative and representative examples. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 10, 1936--1949. Google Scholar
X. Huang, L. Shi, and J. Suykens. 2014b. Support vector machine classifier with pinball loss. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 5, 984--997. Google ScholarCross Ref
J. Kivinen, A. Smola, and R. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8, 2165--2176. Google ScholarDigital Library
D. D. Lewis and W. A. Gale. 1994 A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94). 3--12. Google ScholarCross Ref
L. Li, X. Jin, S. J. Pan, and J. Sun. 2012. Multi-domain active learning for text classification. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 1086--1094. Google ScholarDigital Library
M. Li and I. K. Sethi. 2006. Confidence-based active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 8, 1251--1261. Google ScholarDigital Library
J. Lin and A. Kolcz. 2012. Large-scale machine learning at Twitter. In Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (SIGMOD’12). 793--804. Google ScholarDigital Library
Z. Lu, X. Wu, and J. Bongard. 2009. Active learning with adaptive heterogeneous ensembles. In Proceedings of the 9th International Conference on Data Mining (ICDM’09). 327--336. Google ScholarDigital Library
J. Maiora, B. Ayerdi, and M. Graña. 2014. Random forest active learning for AAA thrombus segmentation in computed tomography angiography images. Neurocomputing 126, 71--77.Google ScholarDigital Library
P. Melville and R. Mooney. 2004. Diverse ensembles for active learning. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 584--591. Google ScholarDigital Library
H. Nguyen and A. Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 623--630. Google ScholarDigital Library
N. Roy and A. McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 441--448.Google ScholarDigital Library
A. I. Schein and L. H. Ungar. 2007. Active learning for logistic regression: An evaluation. Machine Learning 68, 3, 235--265. Google ScholarDigital Library
G. Schohn and D. Cohn. 2000. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 839--846.Google Scholar
B. Settles. 2010. Active Learning Literature Survey. Computer Sciences Technical Report. University of Wisconsin at Madison.Google Scholar
B. Settles and M. Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the International Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 1070--1079. Google ScholarCross Ref
B. Settles, M. Craven, and S. Ray. 2008. Multiple-instance active learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’08). 1289--1296.Google Scholar
G. Sharma, F. Jurie, and P. Perez. 2014. Learning Non-Linear SVM in Input Space for Image Classification. Retrieved May 23, 2017, from https://hal.inria.fr/hal-00977304v1/document.Google Scholar
S. Shwartz, Y. Singer, and N. Srebro. 2007. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 807--814.Google Scholar
M. Sugiyama and S. Nakajima. 2009. Pool-based active learning in approximate linear regression. Machine Learning 75, 3, 249--274. Google ScholarDigital Library
S. Tong and E. Chang. 2001. Support vector machine active learning for image retrieval. In Proceedings of the 9th ACM International Conference on Multimedia (MM’01). 107--118. Google ScholarDigital Library
S. Tong and D. Koller. 2001. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45--66.Google ScholarDigital Library
C. Tsai, K. McGarry, and J. Tait. 2006. CLAIRE: A modular support vector image indexing and classification system. ACM Transactions on Information Systems 24, 3, 353--379. Google ScholarDigital Library
V. Vapnik. 1999. The Nature of Statistical Learning Theory. Springer.Google ScholarDigital Library
D. Vasisht, A. Damianou, M. Varma, and A. Kapoor. 2014. Active learning for sparse Bayesian multilabel classification. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 472--481. Google ScholarDigital Library
A. Vlachos. 2004. Active Learning with Support Vector Machines. Master’s Thesis. School of Informatics, University of Edinburgh.Google Scholar
Z. Wang, K. Crammer, and S. Vucetic. 2012. Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training. Journal of Machine Learning Research 13, 1, 3103--3131.Google ScholarDigital Library
Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang. 2003. Representative sampling for text classification using support vector machines. In Proceedings of the European Conference on Information Retrieval (ECIR’03). 393--407. Google ScholarCross Ref
J. Zhang, R. Jin, Y. Yang, and A. G. Hauptmann. 2003. Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 888--895.Google Scholar
J. Zhu, H. Wang, and B. Tsou. 2009. A density-based re-ranking technique for active learning for data annotations. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. 1--10. Google ScholarDigital Library

Index Terms

Active Learning for Classification with Maximum Model Change
1. Information systems
  1. Information systems applications

Recommendations

Stopping Criterion for Active Learning with Model Stability
Regular Papers

Active learning selectively labels the most informative instances, aiming to reduce the cost of data annotation. While much effort has been devoted to active sampling functions, relatively limited attention has been paid to when the learning process ...
Read More
Active Learning for Support Vector Machines with Maximum Model Change
Machine Learning and Knowledge Discovery in Databases
Abstract
Margin-based strategies and model change based strategies represent two important types of strategies for active learning. While margin-based strategies have been dominant for Support Vector Machines (SVMs), most methods are based on heuristics ...
Read More
Active learning for support vector machines with maximum model change
ECMLPKDD'14: Proceedings of the 2014th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

Margin-based strategies and model change based strategies represent two important types of strategies for active learning. While margin-based strategies have been dominant for Support Vector Machines (SVMs), most methods are based on heuristics and lack ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 36, Issue 2
April 2018
371 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3133943
Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 August 2017
- Accepted: 1 April 2017
- Revised: 1 February 2017
- Received: 1 November 2016
Published in tois Volume 36, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Active learning
classification
logistic regression
maximum model change
support vector machines
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 664
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Active Learning for Classification with Maximum Model Change

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Stopping Criterion for Active Learning with Model Stability

Active Learning for Support Vector Machines with Maximum Model Change

Active learning for support vector machines with maximum model change