Skip to main content
Log in

Active instance selection via parametric equation and instance overlap aware scheme

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Active learning (AL) reduces the human labeling effort by learning a classifier with the small labeled data. This small labeled data is formed by those instances, which reduce generalization error the most. As AL reduces the labeling cost, but most of the pool based query strategies evaluate all unlabeled instances in each iteration of query instance selection, which makes them computationally extensive. Also, many times the query strategies lead to the selection of redundant/overlapped instances, which leads to no improvement in the generalization performance. In this work, an advanced query strategy under pool based scenario has been proposed. This query strategy utilizes parametric equation based query synthesis as the informative criterion and the instance overlap aware scheme as the representative criterion. This informative criterion identifies the input instances near the decision boundary, which speeds up the instance selection process and hence reduces the response time. On the set of identified instances, the representative criterion is used to avoid the overlapped instance selection, which improves the generalization performance. Also, a meta-learning based approach has been used to identify the values of the main parameters used in the formulated criteria. The comparison of the proposed approach with the existing baseline solutions on artificial as well as real-world datasets demonstrates that the proposed approach achieves a significant reduction in the query instance selection time, along with an enhancement in the generalization performance over the existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342

    MathSciNet  MATH  Google Scholar 

  2. Awasthi P, Feldman V, Kanade V (2012) Learning using local membership queries. J Mach Learn Res:30

  3. Bary G (2015) Learning using 1-local membership queries. CoRR arXiv:1512.00165

  4. Baum EB (1991) Neural net algorithms that learn in polynomial time from examples and queries. Trans Neural Netw 2(1)

  5. Boddy R, Smith G (2009) Statistical Methods in Practice: For Scientists and Technologists. Wiley

  6. Chapelle O, Schlkopf B, Zien A (2010) Semi-Supervised Learning, 1st edn. The MIT Press

  7. Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221

    Google Scholar 

  8. Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient knn classification algorithm for big data. Neurocomput 195:143– 148

    Article  Google Scholar 

  9. Dheeru D, Karra Taniskidou E (2017) Uci machine learning repository. http://archive.ics.uci.edu/ml

  10. Ferrari D, De Castro L (2015) Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf Sci 301:181–194

    Article  Google Scholar 

  11. Fong S, Li G, Dey N, Gonzalez Crespo R, Herrera-Viedma E (2020) Finding an accurate early forecasting model from small dataset: a case of 2019-ncov novel coronavirus outbreak. Int J Interact Multimed Artif Intell 6:132–140

    Google Scholar 

  12. Gissin D, Shalev-Shwartz S (2019) Discriminative active learning. CoRR arXiv:1907.06347

  13. Gu. B, Zhai Z, Deng C, Huang H (2020) Efficient active learning by querying discriminative and representative samples and fully exploiting unlabeled data. IEEE Trans Neural Netw Learn Syst:1–12

  14. Guo J, Chen H, Sun Z, Lin Y (2004) A novel method for protein secondary structure prediction using dual-layer svm and profiles. Proteins Struct Funct Bioinform 54(4):738–743

    Article  Google Scholar 

  15. Gupta S, Gupta A (2017) A set of measures designed to identify overlapped instances in software defect prediction. Computing 99(9):889–914

    Article  MathSciNet  Google Scholar 

  16. Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317:67–77

    Article  Google Scholar 

  17. Horvath T, Mantovani R, de Carvalho A (2017) Effects of random sampling on svm hyper-parameter tuning. In: Intelligent systems design and applications, pp 268–278

  18. Hu R (2011) Active learning for text classification. Dublin Institute of Technology

  19. Hu X, Wang L, Yuan B (2012) Querying representative points from a pool based on synthesized queries. In: The 2012 international joint conference on neural networks (IJCNN), pp 1–6

  20. Huang S, Jin R, Zhou Z (2014) Active learning by querying informative and representative examples. IEEE Trans Pattern Anal Mach Intell 36(10):1936–1949

    Article  Google Scholar 

  21. Lang KEB (1992) Query learning can work poorly when a human oracle is used. In: Proceedings of the IEEE international joint conference on neural networks

  22. Kapoor A, Horvitz E, Basu S (2007) Selective supervision: guiding supervised learning with decision-theoretic active learning. In: Proceedings of the 20th international joint conference on artifical intelligence, pp 877–882

  23. Konyushkova K, Sznitman R, Fua P (2017) Learning active learning from data. In: Advances in neural information processing systems, vol 30, pp 4225–4235

  24. Kumar A, Halder A (2019) Active learning using fuzzy-rough nearest neighbour classifier for cancer prediction from microarray gene expression data. International Journal of Pattern Recognition and Artificial Intelligence

  25. Kumar P, Gupta A (2020a) Active learning query strategies for classification, regression, and clustering: a survey. J Comput Sci Technol 35:913–945

  26. Kumar P, Gupta A (2020b) Overlap aware active learning query strategies for pool based scenario. IETE Techn Rev:1–10

  27. LeJeune D, Baraniuk G, Heckel R (2019) Adaptive estimation for approximate k-nearest-neighbor computations. CoRR arXiv:1902.09465

  28. Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. Mach Learn Proc 1994:148–156

    Google Scholar 

  29. Leyva E, Caises Y, González A, Pérez R (2014) On the use of meta-learning for instance selection: an architecture and an experimental study. Inf Sci 266:16–30

    Article  Google Scholar 

  30. Liu M, Buntine W, Haffari G (2018) Learning how to actively learn: A deep imitation learning approach. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1874–1883

  31. McCallum A, Nigam K (1998) Employing em and pool-based active learning for text classification. In: Proceedings of the fifteenth international conference on machine learning, pp 350–358

  32. Muhammad G, Alhamid M (2017) User emotion recognition from a larger pool of social network data using active learning. Multimed Tools Appl:76

  33. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  34. Pimentel B, de Carvalho A (2018) A new data characterization for selecting clustering algorithms using meta-learning. Inf Sci:477

  35. Schumann R, Rehbein I (2019) Active learning via membership query synthesis for semi-supervised sentence classification. In: Proceedings of the 23rd conference on computational natural language learning (CoNLL), pp 472–481

  36. Settles B (2012) Active Learning. Morgan and Claypool Publishers

  37. Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, pp 1070–1079

  38. de Souto M, Prudêncio R, Soares R, Araujo D, Costa I, Ludermir T, Schliep A (2008) Ranking and selecting clustering algorithms using a meta-learning approach. In: Proceedings of the international joint conference on neural networks, pp 3729–3735

  39. Tang YP, Huang S (2019) Self-paced active learning: Query the right thing at the right time. In: The 33rd AAAI conference on artificial intelligence (AAAI’19)

  40. Tang YP, Li GX, Huang SJ (2019) ALiPy: Active learning in python. Technical report, Nanjing University of Aeronautics and Astronautics, https://github.com/NUAA-AL/ALiPy

  41. Tran VC, Nguyen NT, Fujita H, Hoang DT, Hwang D (2017) A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields. Knowl-Based Syst 132:179–187

    Article  Google Scholar 

  42. Tuia D, Ratle F, Pacifici F, Kanevski M, Emery W (2009) Active learning methods for remote sensing image classification. IEEE Trans Geosci Remote Sens 47(7):2218–2232

    Article  Google Scholar 

  43. Wang G, Song Q, Zhang X, Zhang K (2014) A generic multilabel learning-based classification algorithm recommendation method. ACM Trans Knowl Discov Data 9(1):1–30

    Article  Google Scholar 

  44. Wang L, Hu X, Yuan B, Lu J (2015) Active learning via query synthesis and nearest neighbour search. Neurocomputing 147:426–434

    Article  Google Scholar 

  45. Wu Y, Wu Q, Dey N, Sherratt R (2020) Learning models for semantic classification of insufficient plantar pressure images. International Journal of Interactive Multimedia and Artificial Intelligence

  46. Yang YY, Lee SC, Chung Y, Wu TE, Chen S, Lin HT (2017) libact: Pool-based active learning in python. CoRR arXiv:1710.00379

  47. Zhu X (2005) Semi-supervised learning literature survey. Tech. rep., Computer Sciences, University of Wisconsin-Madison, http://pages.cs.wisc.edu/jerryzhu/pub/ssl_survey.pdf

  48. Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. Seventh IEEE International Conference on Data Mining (ICDM 2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Punit Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, P., Gupta, A. Active instance selection via parametric equation and instance overlap aware scheme. Appl Intell 52, 994–1012 (2022). https://doi.org/10.1007/s10489-021-02395-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02395-2

Keywords

Navigation