Skip to main content

Advertisement

Log in

Tri-partition cost-sensitive active learning through kNN

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Active learning differs from the training–testing scenario in that class labels can be obtained upon request. It is widely employed in applications where the labeling of instances incurs a heavy manual cost. In this paper, we propose a new algorithm called tri-partition active learning through k-nearest neighbors (TALK). The optimization objective is to minimize the total teacher and misclassification costs. First, a k-nearest neighbors classifier is employed to divide unlabeled instances into three disjoint regions. Region I contains instances for which the expected misclassification cost is lower than the teacher cost, Region II contains instances to be labeled by human experts, and Region III contains the remaining instances. Various strategies are designed to determine which instances are in Region II. Second, instances in Regions I and II are labeled and added to the training set, and the tri-partition process is repeated until all instances have been labeled. Experiments are undertaken on eight University of California, Irvine, datasets using different cost settings. Compared with the state-of-the-art cost-sensitive classification and active learning algorithms, our new algorithm generally exhibits a lower total cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Aha DW (1997) Lazy learning. Artif Intell Rev 11:7–10

    Article  MATH  Google Scholar 

  • Basu S (2010) Semi-supervised learning. J Roy Stat Soc 6493(10):2465–2472

    Google Scholar 

  • Blake C, Merz CJ (1998) UCI repository of machine learning databases

  • Bradford JP, Kunz C, Kohavi R, Brunk C, Brodley CE (2006) Pruning decision trees with misclassification costs. Lect Notes Comput Sci 51(1398):131–136

    Google Scholar 

  • Brighton H, Mellish C (2001) Identifying competence-critical instances for instance-based learners. Springer 608:77–94

    Google Scholar 

  • Cai D, He X (2012) Manifold adaptive experimental design for text categorization. IEEE Trans Knowl Data Eng 24(4):707–719

    Article  Google Scholar 

  • Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In: International conference on machine learning, pp 208–215

  • Guo G, Wang H, Bell D, Bi Y, Greer K (2004) KNN model-based approach in classification. Springer, Berlin

    Google Scholar 

  • Harpale AS, Yang Y (2008) Personalized active learning for collaborative filtering. In: International ACM SIGIR conference on research and development in information retrieval, pp 91–98

  • He YW, Zhang HR, Min F (2015) A teacher-cost-sensitive decision-theoretic rough set model. Springer, New York

    Book  Google Scholar 

  • Jin R, Si L (2004) A bayesian approach toward active learning for collaborative filtering, pp 278–285

  • Lesot MJ, Rifqi M, Benhadda H (2009) Similarity measures for binary and numerical data: a survey. Int J Knowl Eng Soft Data Paradig 1(1):63–84

    Article  Google Scholar 

  • Li HX, Zhang LB, Huang B, Zhou XZ (2016) Sequential three-way decision and granulation for cost-sensitive face recognition. Knowl Based Syst 91:241–251

    Article  Google Scholar 

  • Li JH, Ren Y, Mei CL, Qian YH, Yang XB (2016) A comparative study of multigranulation rough sets and concept lattices via rule acquisition. Knowl Based Syst 91:152–164

    Article  Google Scholar 

  • Li XN, Yi HJ, She YH, Sun BZ (2017) Generalized three-way decision models based on subset evaluation. Int J Approximate Reasoning 83:142–159

    Article  MathSciNet  MATH  Google Scholar 

  • Liu D, Li TR, Ruan D (2011) Probabilistic model criteria with decision-theoretic rough sets. Inf Sci 181:3709–3722

    Article  MathSciNet  Google Scholar 

  • Liu D, Li TR, Liang DC (2014) Incorporating logistic regression to decision-theoretic rough sets for classifications. Int J Approx Reason 55:197–210

    Article  MathSciNet  MATH  Google Scholar 

  • Liu D, Liang D, Wang C (2016) A novel three-way decision model based on incomplete information system. Knowl-Based Syst 91:32–45

    Article  Google Scholar 

  • Long B, Bian J, Chapelle O, Zhang Y (2015) Active learning for ranking through expected loss optimization. IEEE Trans Knowl Data Eng 27(5):1180–1191

    Article  Google Scholar 

  • Long B, Chapelle O, Zhang Y, Chang Y, Zheng Z, Tseng B (2010) Active learning for ranking through expected loss optimization. In: Proceeding of the international ACM SIGIR conference on research and development in information retrieval, SIGIR 2010, Geneva, Switzerland, pp 267–274

  • Mccallum A, Nigam K (1998) Employing EM and pool-based active learning for text classification. In: Fifteenth international conference on machine learning, pp 350–358

  • Min F, Liu QH (2009) A hierarchical model for test-cost-sensitive decision systems. Inf Sci 179:2442–2452

    Article  MathSciNet  MATH  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Quinlan JR (2014) C.45: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  • Rand GK (1979) Decision systems for inventory management and production planning. Wiley, New York

    Google Scholar 

  • Saartsechansky M, Provost F (2004) Active sampling for class probability estimation and ranking. Mach Learn 54(2):153–178

    Article  MATH  Google Scholar 

  • Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6(1):1–114

    Article  MathSciNet  MATH  Google Scholar 

  • Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth workshop on computational learning theory, vol 284, pp 287–294

  • Sheng VS (2012) Studying active learning in the cost-sensitive framework. In: Hawaii international conference on system sciences, pp 1097–1106

  • Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(1):45–66

    MATH  Google Scholar 

  • Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7

  • Wang M, Min F, Zhang ZH, Wu YX (2017) Active learning through density clustering. Expert Syst Appl 85:305–317

    Article  Google Scholar 

  • Yao YY (2012) An outline of a theory of three-way decisions. In: International conference on rough sets and current trends in computing, Springer, New York, pp 1–17

  • Yao YY (2016) Three-way decisions and cognitive computing. Cognit Comput 8(4):543–554

  • Yao YY (2010) Three-way decisions with probabilistic rough sets. Inf Sci 180(3):341–353

    Article  MathSciNet  Google Scholar 

  • Zhang HR, Min F, Shi B (2016) Regression-based three-way recommendation. Inf Sci

  • Zhang HR, Min F (2016) Three-way recommender systems based on random forests. Knowl Based Syst 91:275–286

    Article  Google Scholar 

  • Zhang BW, Min F, Ciucci D (2015) Representative-based classification through covering-based neighborhood rough sets. Appl Intell 43(4):840–854

    Article  Google Scholar 

  • Zhang Y, Zhou ZH (2008) Cost-sensitive face recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–8

  • Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82

    Article  Google Scholar 

  • Zhao Y, Yao Y, Luo F (2007) Data analysis based on discernibility and indiscernibility. Inf Sci 177(22):4959–4976

    Article  MATH  Google Scholar 

  • Zhao H, Wang P, Hu QH (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149

    Article  MathSciNet  Google Scholar 

  • Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77

    Article  MathSciNet  Google Scholar 

  • Zhou B, Yao Y, Luo J (2014) Cost-sensitive three-way email spam filtering. J Intell Inf Syst 42(1):19–45

    Article  Google Scholar 

  • Zhu XQ, Wu XD (2005) Cost-constrained data acquisition for intelligent data preparation. IEEE Trans Knowl Data Eng 17(11):1542–1556

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by National Natural Science Foundation of China (Grant No. 61379089) and the Natural Science Foundation of Department of Education of Sichuan Province (Grant No. 16ZA0060).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Min.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Min, F., Liu, FL., Wen, LY. et al. Tri-partition cost-sensitive active learning through kNN. Soft Comput 23, 1557–1572 (2019). https://doi.org/10.1007/s00500-017-2879-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2879-x

Keywords

Navigation