skip to main content
research-article

Active Learning for Classification with Maximum Model Change

Authors Info & Claims
Published:31 August 2017Publication History
Skip Abstract Section

Abstract

Most existing active learning studies focus on designing sample selection algorithms. However, several fundamental problems deserve investigation to provide deep insight into active learning. In this article, we conduct an in-depth investigation on active learning for classification from the perspective of model change. We derive a general active learning framework for classification called maximum model change (MMC), which aims at querying the influential examples. The model change is quantified as the difference between the model parameters before and after training with the expanded training set. Inspired by the stochastic gradient update rule, the gradient of the loss with respect to a given candidate example is adopted to approximate the model change. This framework is applied to two popular classifiers: support vector machines and logistic regression. We analyze the convergence property of MMC and theoretically justify it. We explore the connection between MMC and uncertainty-based sampling to provide a uniform view. In addition, we discuss its potential usability to other learning models and show its applicability in a wide range of applications. We validate the MMC strategy on two kinds of benchmark datasets, the UCI repository and ImageNet, and show that it outperforms many state-of-the-art methods.

References

  1. N. Abe and H. Mamitsuka. 1998. Query learning strategies using boosting and bagging. In Proceedings of the 15th International Conference on Machine Learning (ICML’98). 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Balcan, A. Beygelzimer, and J. Langford. 2006. Agnostic active learning. In Proceedings of the 23rd International Conference on Machine Learning (ICML’06). 65--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Breiman. 2001. Random forests. Machine Learning Journal 45, 1, 5--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Brinker. 2003. Incorporating diversity in active learning with support vector machines. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 59--66.Google ScholarGoogle Scholar
  5. C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2, 121--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Cai, M. Zhang, and Y. Zhang. 2015. Active learning for ranking with sample density. Information Retrieval Journal 18, 2, 123--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Cai, Y. Zhang, and J. Zhou. 2013. Maximizing expected model change for active learning in regression. In Proceedings of the 13th International Conference on Data Mining (ICDM’13). 51--60. Google ScholarGoogle ScholarCross RefCross Ref
  8. W. Cai, Y. Zhang, S. Zhou, W. Wang, C. Ding, and X. Gu. 2014. Active learning for support vector machines with maximum model change. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’14). 211--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. B. Cao, J. Xu, T. Y. Liu, and H. Li. 2006. Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). 186--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Chang and C. Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, Article No. 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. A. Chon, Z. Ghahramani, and M. I. Jordan. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 129--145.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Dasgupta and D. Hsu. 2008. Hierarchical sampling for active learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 208--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Dasgupta, D. Hsu, and C. Monteleoni. 2007. A general agnostic active learning algorithm. In Proceedings of Advances in Neural Information Processing Systems (NIPS’07). 353--360.Google ScholarGoogle Scholar
  14. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255.Google ScholarGoogle Scholar
  15. P. Donmez and J. G. Carbonell. 2008. Optimizing estimated loss reduction for active sampling in rank learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 248--255. Google ScholarGoogle ScholarCross RefCross Ref
  16. K. Dwyer and R. Holte. 2007. Decision tree instability and active learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’07). 128--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. 1997. Selective sampling using the query by committee algorithm. Machine Learning 28, 2, 133--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Freytag, E. Rodner, and J. Denzler. 2014. Selecting influential examples: Active learning with expected model output changesIn Proceedings of the European Conference on Computer Vision (ECCV’14). 562--577.Google ScholarGoogle Scholar
  19. Y. Fu, X. Zhu, and B. Li. 2013. A survey on instance selection for active learning. Knowledge and Information Systems 35, 2, 249--283. Google ScholarGoogle ScholarCross RefCross Ref
  20. Y. Guo. 2010. Active instance sampling via matrix partition. In Proceedings of Advances in Neural Information Processing Systems (NIPS’10). 802--810.Google ScholarGoogle Scholar
  21. Y. Guo and D. Schuurmans. 2007. Discriminative batch mode active learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’07). 593--600.Google ScholarGoogle Scholar
  22. Steve Hanneke. 2007. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 353--360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. C. Hoi, R. Jin, and M. R. Lyu. 2009a. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering 21, 9, 1233--1248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. C. Hoi, R. Jin, J. Zhu, and M. Lyu. 2009b. Semisupervised SVM batch mode active learning with applications to image retrieval. ACM Transactions on Information Systems 27, 3, Article No. 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. M. Hospedales, S. Gong, and T. Xiang. 2013. Finding rare classes: Active learning with generative and discriminative models. IEEE Transactions on Knowledge and Data Engineering 25, 2, 374--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Huang, R. Jin, and Z. Zhou. 2010. Active learning by querying informative and representative examples. In Proceedings of Advances in Neural Information Processing Systems (NIPS’10). 892--900.Google ScholarGoogle Scholar
  27. S. Huang, R. Jin, and Z. Zhou. 2014a. Active learning by querying informative and representative examples. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 10, 1936--1949. Google ScholarGoogle Scholar
  28. X. Huang, L. Shi, and J. Suykens. 2014b. Support vector machine classifier with pinball loss. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 5, 984--997. Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Kivinen, A. Smola, and R. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8, 2165--2176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. D. Lewis and W. A. Gale. 1994 A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94). 3--12. Google ScholarGoogle ScholarCross RefCross Ref
  31. L. Li, X. Jin, S. J. Pan, and J. Sun. 2012. Multi-domain active learning for text classification. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 1086--1094. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Li and I. K. Sethi. 2006. Confidence-based active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 8, 1251--1261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Lin and A. Kolcz. 2012. Large-scale machine learning at Twitter. In Proceedings of the 31st ACM SIGMOD International Conference on Management of Data (SIGMOD’12). 793--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Z. Lu, X. Wu, and J. Bongard. 2009. Active learning with adaptive heterogeneous ensembles. In Proceedings of the 9th International Conference on Data Mining (ICDM’09). 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Maiora, B. Ayerdi, and M. Graña. 2014. Random forest active learning for AAA thrombus segmentation in computed tomography angiography images. Neurocomputing 126, 71--77.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Melville and R. Mooney. 2004. Diverse ensembles for active learning. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 584--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Nguyen and A. Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 623--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. N. Roy and A. McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 441--448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. I. Schein and L. H. Ungar. 2007. Active learning for logistic regression: An evaluation. Machine Learning 68, 3, 235--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. Schohn and D. Cohn. 2000. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 839--846.Google ScholarGoogle Scholar
  41. B. Settles. 2010. Active Learning Literature Survey. Computer Sciences Technical Report. University of Wisconsin at Madison.Google ScholarGoogle Scholar
  42. B. Settles and M. Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the International Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 1070--1079. Google ScholarGoogle ScholarCross RefCross Ref
  43. B. Settles, M. Craven, and S. Ray. 2008. Multiple-instance active learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS’08). 1289--1296.Google ScholarGoogle Scholar
  44. G. Sharma, F. Jurie, and P. Perez. 2014. Learning Non-Linear SVM in Input Space for Image Classification. Retrieved May 23, 2017, from https://hal.inria.fr/hal-00977304v1/document.Google ScholarGoogle Scholar
  45. S. Shwartz, Y. Singer, and N. Srebro. 2007. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 807--814.Google ScholarGoogle Scholar
  46. M. Sugiyama and S. Nakajima. 2009. Pool-based active learning in approximate linear regression. Machine Learning 75, 3, 249--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. Tong and E. Chang. 2001. Support vector machine active learning for image retrieval. In Proceedings of the 9th ACM International Conference on Multimedia (MM’01). 107--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. Tong and D. Koller. 2001. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45--66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. C. Tsai, K. McGarry, and J. Tait. 2006. CLAIRE: A modular support vector image indexing and classification system. ACM Transactions on Information Systems 24, 3, 353--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. V. Vapnik. 1999. The Nature of Statistical Learning Theory. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. D. Vasisht, A. Damianou, M. Varma, and A. Kapoor. 2014. Active learning for sparse Bayesian multilabel classification. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 472--481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Vlachos. 2004. Active Learning with Support Vector Machines. Master’s Thesis. School of Informatics, University of Edinburgh.Google ScholarGoogle Scholar
  53. Z. Wang, K. Crammer, and S. Vucetic. 2012. Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training. Journal of Machine Learning Research 13, 1, 3103--3131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang. 2003. Representative sampling for text classification using support vector machines. In Proceedings of the European Conference on Information Retrieval (ECIR’03). 393--407. Google ScholarGoogle ScholarCross RefCross Ref
  55. J. Zhang, R. Jin, Y. Yang, and A. G. Hauptmann. 2003. Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 888--895.Google ScholarGoogle Scholar
  56. J. Zhu, H. Wang, and B. Tsou. 2009. A density-based re-ranking technique for active learning for data annotations. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Active Learning for Classification with Maximum Model Change

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 36, Issue 2
      April 2018
      371 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3133943
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 August 2017
      • Accepted: 1 April 2017
      • Revised: 1 February 2017
      • Received: 1 November 2016
      Published in tois Volume 36, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader