Abstract
Active learning (AL) reduces the human labeling effort by learning a classifier with the small labeled data. This small labeled data is formed by those instances, which reduce generalization error the most. As AL reduces the labeling cost, but most of the pool based query strategies evaluate all unlabeled instances in each iteration of query instance selection, which makes them computationally extensive. Also, many times the query strategies lead to the selection of redundant/overlapped instances, which leads to no improvement in the generalization performance. In this work, an advanced query strategy under pool based scenario has been proposed. This query strategy utilizes parametric equation based query synthesis as the informative criterion and the instance overlap aware scheme as the representative criterion. This informative criterion identifies the input instances near the decision boundary, which speeds up the instance selection process and hence reduces the response time. On the set of identified instances, the representative criterion is used to avoid the overlapped instance selection, which improves the generalization performance. Also, a meta-learning based approach has been used to identify the values of the main parameters used in the formulated criteria. The comparison of the proposed approach with the existing baseline solutions on artificial as well as real-world datasets demonstrates that the proposed approach achieves a significant reduction in the query instance selection time, along with an enhancement in the generalization performance over the existing approaches.
Similar content being viewed by others
References
Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342
Awasthi P, Feldman V, Kanade V (2012) Learning using local membership queries. J Mach Learn Res:30
Bary G (2015) Learning using 1-local membership queries. CoRR arXiv:1512.00165
Baum EB (1991) Neural net algorithms that learn in polynomial time from examples and queries. Trans Neural Netw 2(1)
Boddy R, Smith G (2009) Statistical Methods in Practice: For Scientists and Technologists. Wiley
Chapelle O, Schlkopf B, Zien A (2010) Semi-Supervised Learning, 1st edn. The MIT Press
Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221
Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient knn classification algorithm for big data. Neurocomput 195:143– 148
Dheeru D, Karra Taniskidou E (2017) Uci machine learning repository. http://archive.ics.uci.edu/ml
Ferrari D, De Castro L (2015) Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf Sci 301:181–194
Fong S, Li G, Dey N, Gonzalez Crespo R, Herrera-Viedma E (2020) Finding an accurate early forecasting model from small dataset: a case of 2019-ncov novel coronavirus outbreak. Int J Interact Multimed Artif Intell 6:132–140
Gissin D, Shalev-Shwartz S (2019) Discriminative active learning. CoRR arXiv:1907.06347
Gu. B, Zhai Z, Deng C, Huang H (2020) Efficient active learning by querying discriminative and representative samples and fully exploiting unlabeled data. IEEE Trans Neural Netw Learn Syst:1–12
Guo J, Chen H, Sun Z, Lin Y (2004) A novel method for protein secondary structure prediction using dual-layer svm and profiles. Proteins Struct Funct Bioinform 54(4):738–743
Gupta S, Gupta A (2017) A set of measures designed to identify overlapped instances in software defect prediction. Computing 99(9):889–914
Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317:67–77
Horvath T, Mantovani R, de Carvalho A (2017) Effects of random sampling on svm hyper-parameter tuning. In: Intelligent systems design and applications, pp 268–278
Hu R (2011) Active learning for text classification. Dublin Institute of Technology
Hu X, Wang L, Yuan B (2012) Querying representative points from a pool based on synthesized queries. In: The 2012 international joint conference on neural networks (IJCNN), pp 1–6
Huang S, Jin R, Zhou Z (2014) Active learning by querying informative and representative examples. IEEE Trans Pattern Anal Mach Intell 36(10):1936–1949
Lang KEB (1992) Query learning can work poorly when a human oracle is used. In: Proceedings of the IEEE international joint conference on neural networks
Kapoor A, Horvitz E, Basu S (2007) Selective supervision: guiding supervised learning with decision-theoretic active learning. In: Proceedings of the 20th international joint conference on artifical intelligence, pp 877–882
Konyushkova K, Sznitman R, Fua P (2017) Learning active learning from data. In: Advances in neural information processing systems, vol 30, pp 4225–4235
Kumar A, Halder A (2019) Active learning using fuzzy-rough nearest neighbour classifier for cancer prediction from microarray gene expression data. International Journal of Pattern Recognition and Artificial Intelligence
Kumar P, Gupta A (2020a) Active learning query strategies for classification, regression, and clustering: a survey. J Comput Sci Technol 35:913–945
Kumar P, Gupta A (2020b) Overlap aware active learning query strategies for pool based scenario. IETE Techn Rev:1–10
LeJeune D, Baraniuk G, Heckel R (2019) Adaptive estimation for approximate k-nearest-neighbor computations. CoRR arXiv:1902.09465
Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. Mach Learn Proc 1994:148–156
Leyva E, Caises Y, González A, Pérez R (2014) On the use of meta-learning for instance selection: an architecture and an experimental study. Inf Sci 266:16–30
Liu M, Buntine W, Haffari G (2018) Learning how to actively learn: A deep imitation learning approach. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1874–1883
McCallum A, Nigam K (1998) Employing em and pool-based active learning for text classification. In: Proceedings of the fifteenth international conference on machine learning, pp 350–358
Muhammad G, Alhamid M (2017) User emotion recognition from a larger pool of social network data using active learning. Multimed Tools Appl:76
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
Pimentel B, de Carvalho A (2018) A new data characterization for selecting clustering algorithms using meta-learning. Inf Sci:477
Schumann R, Rehbein I (2019) Active learning via membership query synthesis for semi-supervised sentence classification. In: Proceedings of the 23rd conference on computational natural language learning (CoNLL), pp 472–481
Settles B (2012) Active Learning. Morgan and Claypool Publishers
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, pp 1070–1079
de Souto M, Prudêncio R, Soares R, Araujo D, Costa I, Ludermir T, Schliep A (2008) Ranking and selecting clustering algorithms using a meta-learning approach. In: Proceedings of the international joint conference on neural networks, pp 3729–3735
Tang YP, Huang S (2019) Self-paced active learning: Query the right thing at the right time. In: The 33rd AAAI conference on artificial intelligence (AAAI’19)
Tang YP, Li GX, Huang SJ (2019) ALiPy: Active learning in python. Technical report, Nanjing University of Aeronautics and Astronautics, https://github.com/NUAA-AL/ALiPy
Tran VC, Nguyen NT, Fujita H, Hoang DT, Hwang D (2017) A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields. Knowl-Based Syst 132:179–187
Tuia D, Ratle F, Pacifici F, Kanevski M, Emery W (2009) Active learning methods for remote sensing image classification. IEEE Trans Geosci Remote Sens 47(7):2218–2232
Wang G, Song Q, Zhang X, Zhang K (2014) A generic multilabel learning-based classification algorithm recommendation method. ACM Trans Knowl Discov Data 9(1):1–30
Wang L, Hu X, Yuan B, Lu J (2015) Active learning via query synthesis and nearest neighbour search. Neurocomputing 147:426–434
Wu Y, Wu Q, Dey N, Sherratt R (2020) Learning models for semantic classification of insufficient plantar pressure images. International Journal of Interactive Multimedia and Artificial Intelligence
Yang YY, Lee SC, Chung Y, Wu TE, Chen S, Lin HT (2017) libact: Pool-based active learning in python. CoRR arXiv:1710.00379
Zhu X (2005) Semi-supervised learning literature survey. Tech. rep., Computer Sciences, University of Wisconsin-Madison, http://pages.cs.wisc.edu/jerryzhu/pub/ssl_survey.pdf
Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. Seventh IEEE International Conference on Data Mining (ICDM 2007)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kumar, P., Gupta, A. Active instance selection via parametric equation and instance overlap aware scheme. Appl Intell 52, 994–1012 (2022). https://doi.org/10.1007/s10489-021-02395-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02395-2