Three-way active learning through clustering selection

Min, Fan; Zhang, Shi-Ming; Ciucci, Davide; Wang, Min

doi:10.1007/s13042-020-01099-2

Three-way active learning through clustering selection

Original Article
Published: 03 March 2020

Volume 11, pages 1033–1046, (2020)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Fan Min ORCID: orcid.org/0000-0002-3290-1036¹,
Shi-Ming Zhang¹,
Davide Ciucci² &
…
Min Wang³

790 Accesses
29 Citations
Explore all metrics

Abstract

In clustering-based active learning, the performance of the learner relies heavily on the quality of clustering results. Empirical studies have shown that different clustering techniques are applicable to different data. In this paper, we propose the three-way active learning through clustering selection (TACS) algorithm to dynamically select the appropriate techniques during the learning process. The algorithm follows the coarse-to-fine scheme of granular computing coupled with three-way instance processing. For label query, we select both representative instances with density peaks, and informative instances with the maximal total distance. For block partition, we revise six popular clustering techniques to speed up learning and accommodate binary splitting. For clustering evaluation, we define weighted entropy with 1-nearest-neighbor. For insufficient labels, we design tree pruning techniques with the use of a block queue. Experiments are undertaken on twelve UCI datasets. The results show that TACS is superior to single clustering technique based algorithms and other state-of-the-art active learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive active learning through k-nearest neighbor optimized local density clustering

Article 04 November 2022

Exploiting Structural Information of Data in Active Learning

Active Semi-supervised K-Means Clustering Based on Silhouette Coefficient

Notes

www.weka.com.

References

Tuia D, Ratle F, Pacifici F, Kanevski MF, Emery WJ (2009) Active learning methods for remote sensing image classification. IEEE Trans Geosci Remote Sens 47(7):2218–2232
Google Scholar
Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. In: ICML, pp 406–414
Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(1):45–66
MATH Google Scholar
Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342
MathSciNet Google Scholar
Settles B (2010) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceeding of the fifth workshop on computational learning theory, vol 284, pp 287–294
Wang R, Chen DG, Kwong S (2014) Fuzzy-rough-set-based active learning. IEEE Trans Fuzzy Syst 22(6):1699–1704
Google Scholar
Wang R, Chow CY, Kwong S (2016) Ambiguity-based multiclass active learning. IEEE Trans Fuzzy Syst 24(1):242–248
Google Scholar
Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: ICML, pp 79–90
Wang M, Min F, Zhang ZH, Wu YX (2017) Active learning through density clustering. Expert Syst Appl 85:305–317
Google Scholar
Du B, Wang ZM, Zhang LF, Zhang LP, Liu W, Shen JL, Tao DC (2017) Exploring representativeness and informativeness for active learning. IEEE Trans Cybern 47(1):14–26
Google Scholar
Huang SJ, Jin R, Zhou ZH (2014) Active learning by querying informative and representative examples. IEEE Trans Pattern Anal Mach Intell 36(10):1936–1949
Google Scholar
Wu YX, Min XY, Min F, Wang M (2019) Cost-sensitive active learning with a label uniform distribution model. Int J Approx Reason 105:49–65
MathSciNet MATH Google Scholar
Wang M, Lin Y, Min F, Liu D (2019) Cost-sensitive active learning through statistical methods. Inf Sci 501:460–482
MathSciNet Google Scholar
Yao YY (2012) An outline of a theory of three-way decisions. In: RSCTC. Springer, Berlin, pp 1–17
Yao YY (2018) Three-way decision and granular computing. Int J Approx Reason 103:107–123
MATH Google Scholar
Li HX, Zhang LB, Zhou XZ, Huang B (2017) Cost-sensitive sequential three-way decision modeling using a deep neural network. Int J Approx Reason 85:68–78
MathSciNet MATH Google Scholar
Yang X, Li TR, Fujita H, Liu D (2019) A sequential three-way approach to multi-class decision. Int J Approx Reason 104:108–125
MathSciNet MATH Google Scholar
Qian J, Liu CH, Yue XD (2019) Multigranulation sequential three-way decisions based on multiple thresholds. Int J Approx Reason 105:396–416
MathSciNet MATH Google Scholar
Yao JT, Vasilakos AV, Pedrycz W (2013) Granular computing: perspectives and challenges. IEEE Trans Syst Man Cybern C Appl Rev 43(6):1977–1989
Google Scholar
Yao YY (1999) Granular computing using neighborhood systems. In: Advances in soft computing. Springer, London, pp 539–553
Dai JH, Hu QH, Hu H, Huang DB (2018) Neighbor inconsistent pair selection for attribute reduction by rough set approach. IEEE Trans Fuzzy Syst 26(2):937–950
Google Scholar
Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41
MathSciNet Google Scholar
Zhao H, Wang P, Hu QH, Zhu PF (2019) Fuzzy rough set based feature selection for large-scale hierarchical classification. IEEE Trans Fuzzy Syst 27:1891–1903
Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Google Scholar
Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-Means clustering algorithm. Appl Stat 28(01):100–108
MATH Google Scholar
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy \(c\)-means clustering algorithm. Comput Geosci 10(2):191–203
Google Scholar
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
MATH Google Scholar
Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD. Morgan Kaufmann Publishers Inc., San Francisco, pp 226–231
Harel D, Koren Y (2001) On clustering using random walks. In: FSTTCS. Springer, Berlin, pp 18–41
Quinlan R (1996) Bagging, Boosting, and C4.5. In: AAAI/IAAI, pp 725–730
Irina R (2001) An empirical study of the Naïve Bayes classifier. In: IJCAI workshop on empirical methods in artificial intelligence, pp 41–46
Cai D, He XF (2012) Manifold adaptive experimental design for text categorization. IEEE Trans Knowl Data Eng 24(4):707–719
Google Scholar
Blake C, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Xiang ZY, Zhang L (2012) Research on an optimized C4.5 algorithm based on rough set theory. In: International conference on management of e-Commerce and e-Government, pp 272–274
Ruan YX, Lin HT, Tsai MF (2014) Improving ranking performance with cost-sensitive ordinal classification via regression. Inf Retr 17(02):133
Google Scholar
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2–3:18–22
Google Scholar
Cortés EA, Martínez MG, Rubio NG (2007) Multiclass corporate failure prediction by Adaboost.M1. Int Adv Econ Res 13(02):301–312
Google Scholar
Cai YD, Feng KY, Lu WC, Chou KC (2006) Using LogitBoost classifier to predict protein structural classes. J Theor Biol 238(2):172–176
Google Scholar
Afshar S, Mosleh M, Kheyrandish M (2013) Presenting a new multiclass classifier based on learning automata. Neurocomputing 104:97–104
Google Scholar
Zhang SL, Zhang TS, Liu M, Li KL, Yuan BZ (2010) An experimental study of classifier filtering. In: ICWMMN, pp 361–364
Reyes O, Altalhi AH, Ventura S (2018) Statistical comparisons of active learning strategies over multiple datasets. Knowl-Based Syst 145:274–288
Google Scholar
Gilad-Bachrach R, Navot A, Tishby N (2004) Kernel query by committee (KQBC). Leibniz Center Technical Report 88, Hebrew University
Cohn DA, Ghahramani ZB, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4(1):129–145
MATH Google Scholar
Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: ICML, pp 1–8
Belkin M, Niyogi P (2004) Semi-supervised learning on Riemannian manifolds. Mach Learn 56(1–3):209–239
MATH Google Scholar
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR, pp 3–12
Wang R, Wang XZ, Kwong S, Chen X (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475
Google Scholar
Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In: ICML, pp 208–215
Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: ICML, pp 148–156
Campbell C, Cristianini N, Smola A (2000) Query learning with large margin classifiers. In: ICML, pp 111–118
Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. In: ICDM, pp 757–762
Yao YY, Wong S (1992) A decision theoretic framework for approximating concepts. Int J Man Mach Stud 37:793–809
Google Scholar
Liu D, Liang DC, Wang CC (2016) A novel three-way decision model based on incomplete information system. Knowl-Based Syst 91:32–45
Google Scholar
Hu BQ (2014) Three-way decisions space and three-way decisions. Inf Sci 281:21–52
MathSciNet MATH Google Scholar
Fang Y, Min F (2019) Cost-sensitive approximate attribute reduction with three-way decisions. Int J Approx Reason 104:148–165
MathSciNet MATH Google Scholar
Zhang QH, Xia DY, Liu KX, Wang GY (2020) A general model of decision-theoretic three-way approximations of fuzzy sets based on a heuristic algorithm. Inf Sci 507:522–539
Google Scholar
Li JH, Huang CC, Qi JJ, Qian YH, Liu WQ (2017) Three-way cognitive concept learning via multi-granularity. Inf Sci 378(1):244–263
MATH Google Scholar
Shivhare R, Cherukuri AK (2017) Three-way conceptual approach for cognitive memory functionalities. Int J Mach Learn Cybern 8:21–34
Google Scholar
Qi JJ, Qian T, Wei L (2016) The connections between three-way and classical concept lattices. Knowl-Based Syst 91:143–151
Google Scholar
Zhi HL, Qi JJ, Qian T, Wei L (2019) Three-way dual concept analysis. Int J Approximate Reasoning 114:151–165
MathSciNet MATH Google Scholar
Zhang HR, Min F, Shi B (2017) Regression-based three-way recommendation. Inf Sci 378:444–461
Google Scholar
Yu H, Wang XC, Wang GY, Zeng XH (2020) An active three-way clustering method via low-rank matrices for multi-view data. Inf Sci 507:823–839
Google Scholar
Jia XY, Li WW, Shang L (2019) A multiphase cost-sensitive learning method based on the multiclass three-way decision-theoretic rough set model. Inf Sci 485:248–262
Google Scholar
Min F, Zhang ZH, Zhai WJ, Shen RP (2020) Frequent pattern discovery with tri-partition alphabets. Inf Sci 507:715–732
MathSciNet Google Scholar
Min F, Liu FL, Wen LY, Zhang ZH (2019) Tri-partition cost-sensitive active learning through kNN. Soft Comput 23:1557–1572
Google Scholar

Download references

Acknowledgements

This work is in part supported by the Natural Science Foundation of Sichuan Province under Grant number 2019YJ0314, and the Sichuan Province Youth Science and Technology Innovation Team under Grant number 2019JDTD0017.

Author information

Authors and Affiliations

School of Computer Science, Southwest Petroleum University, Chengdu, 610500, China
Fan Min & Shi-Ming Zhang
DISCo, University of Milano-Bicocca, viale Sarca 336/14, 20126, Milan, Italy
Davide Ciucci
School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu, 610500, China
Min Wang

Authors

Fan Min
View author publications
You can also search for this author in PubMed Google Scholar
Shi-Ming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Davide Ciucci
View author publications
You can also search for this author in PubMed Google Scholar
Min Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Min.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Min, F., Zhang, SM., Ciucci, D. et al. Three-way active learning through clustering selection. Int. J. Mach. Learn. & Cyber. 11, 1033–1046 (2020). https://doi.org/10.1007/s13042-020-01099-2

Download citation

Received: 24 September 2019
Accepted: 17 February 2020
Published: 03 March 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s13042-020-01099-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Three-way active learning through clustering selection

Abstract

Access this article

Similar content being viewed by others

Adaptive active learning through k-nearest neighbor optimized local density clustering

Exploiting Structural Information of Data in Active Learning

Active Semi-supervised K-Means Clustering Based on Silhouette Coefficient

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Three-way active learning through clustering selection

Abstract

Access this article

Similar content being viewed by others

Adaptive active learning through k-nearest neighbor optimized local density clustering

Exploiting Structural Information of Data in Active Learning

Active Semi-supervised K-Means Clustering Based on Silhouette Coefficient

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation