Abstract
Pairwise constraints, a cheaper kind of supervision information that does not need to reveal the class labels of data points, were initially suggested to enhance the performance of clustering algorithms. Recently, researchers were interested in using them for feature selection. However, in most current methods, pairwise constraints are provided passively and generated randomly over multiple algorithmic runs by which the results are averaged. This leads to the need of a large number of constraints that might be redundant, unnecessary, and under some circumstances even inimical to the algorithm’s performance. It also masks the individual effect of each constraint set and introduces a human labor-cost burden. Therefore, in this paper, we suggest a framework for actively selecting and then propagating constraints for feature selection. For that, we benefit from the graph Laplacian that is defined on the similarity matrix. We assume that when a small perturbation of the similarity value between a data couple leads to a more well-separated cluster indicator based on the second eigenvector of the graph Laplacian, this couple is definitely expected to be a pairwise query of higher and more significant impact. Constraints propagation on the other side ensures increasing supervision information while decreasing the cost of human-labor. Finally, experimental results validated our proposal in comparison to other known feature selection methods and proved to be prominent.














Similar content being viewed by others
Notes
can be downloaded at https://www-lisic.univ-littoral.fr/~porebski/Recherche: Porebski et al. (2018).
References
Abin AA, Beigy H (2014) Active selection of clustering constraints: a sequential approach. Pattern Recognit 47(3):1443–1458
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750
Basu S, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the 2004 SIAM international conference on data mining, SIAM, pp 333–344
Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance, and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143
Benabdeslem K, Elghazel H, Hindawi M (2016) Ensemble constrained laplacian score for efficient and robust semi-supervised feature selection. Knowl Inf Syst 49(3):1161–1185
Bishop C, Bishop CM et al (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Cheng Y, Cai Y, Sun Y, Li J (2008) Semi-supervised feature selection under logistic i-relief framework. In: ICPR 2008. In: 19th international conference on pattern recognition, IEEE, pp 1–4
Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: European conference on principles of data mining and knowledge discovery, Springer, pp 115–126
Gilad-Bachrach R, Navot A, Tishby N (2004) Margin based feature selection-theory and algorithms. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p 43
Givoni I, Frey B (2009) Semi-supervised affinity propagation with instance-level constraints. In: Artificial intelligence and statistics, pp 161–168
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514
Hijazi S, Kalakech M, Hamad D, Kalakech A (2018) Feature selection approach based on hypothesis-margin and pairwise constraints. In: 2018 IEEE Middle East and North Africa communications conference (MENACOMM), IEEE, pp 1–6
Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection-based semi-supervised feature selection. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 1080–1085
Huang L, Yan D, Taft N, Jordan MI (2009) Spectral clustering with perturbed data. In: Advances in neural information processing systems, pp 705–712
Jiang Y, Ren J (2011) Eigenvector sensitive feature selection for spectral clustering. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 114–129
Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665
Kamvar K, Sepandar S, Klein K, Dan D, Manning M, Christopher C (2003) Spectral learning. In: International joint conference of artificial intelligence, Stanford InfoLab
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on Machine learning, pp 249–256
Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. Tech. rep, Stanford
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: European conference on machine learning. Springer, pp 171–182
Li Z, Liu J, Tang X (2008) Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 576–583
Lichman M (2013) Uci machine learning repository. http://archive.ics.uci.edu/ml
Liu H, Motoda H, Yu L (2004) A selective sampling approach to active feature selection. Artif Intell 159(1–2):49–74
Lu Z, Carreira-Perpinan MA (2008) Constrained spectral clustering through affinity propagation. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Lu Z, Ip HH (2010) Constrained spectral clustering via exhaustive and efficient constraint propagation. In: European conference on computer vision. Springer, pp 1–14
Mäenpää T, Pietikäinen M (2004) Classification with color and texture: jointly or separately? Pattern Recognit 37(8):1629–1640
Mallapragada PK, Jin R, Jain AK (2008) Active query selection for semi-supervised clustering. In: 19th international conference on pattern recognition, 2008. ICPR 2008. IEEE, pp 1–4
Ning H, Xu W, Chi Y, Gong Y, Huang TS (2010) Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recognit 43(1):113–127
Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recognit 29(1):51–59
Ojala T, Maenpaa T, Pietikainen M, Viertola J, Kyllonen J, Huovinen S (2002) Outex-new framework for empirical evaluation of texture analysis algorithms. IEEE object recognition supported by user interaction for service robots 1:701–706
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Pietikäinen M, Mäenpää T, Viertola J (2002) Color texture classification with color histograms and local binary patterns. In: Workshop on texture analysis in machine vision, pp 109–112
Porebski A, Vandenbroucke N, Hamad D (2013) LBP histogram selection for supervised color texture classification. In: 2013 IEEE international conference on image processing, IEEE, pp 3239–3243
Porebski A, Hoang VT, Vandenbroucke N, Hamad D (2018) Multi-color space local binary pattern-based feature selection for texture classification. J Electron Imaging 27(1):011010
Qian M, Zhai C (2013) Robust unsupervised feature selection. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 1621–1627
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64:141–158
Shi L, Du L, Shen YD (2014) Robust spectral learning for unsupervised feature selection. In: 2014 IEEE international conference on data mining. IEEE, pp 977–982
Sotoca JM, Pla F (2010) Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recognit 43(6):2068–2081
Stewart GW, Sun JG (1990) Matrix perturbation theory. Academic Press, Cambridge
Sun Y, Li J (2006) Iterative relief for feature weighting. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 913–920
Urbanowicz RJ, Meeker M, LaCava W, Olson RS, Moore JH (2017) Relief-based feature selection: introduction and review. arXiv preprint arXiv:1711.08421
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. ICML 1:577–584
Wagstaff KL, desJardins M, Xu Q, (2005) Active constrained clustering by examining spectral eigenvectors. Jet Propulsion Laboratory, National Aeronautics and Space Administration, Pasadena, CA
Wang S, Tang J, Liu H (2015) Embedded unsupervised feature selection. In: Twenty-ninth AAAI conference on artificial intelligence
Wang X, Wang J, Qian B, Wang F, Davidson I (2014) Self-taught spectral clustering via constraint augmentation. In: Proceedings of the 2014 SIAM international conference on data mining, SIAM, pp 416–424
Wauthier FL, Jojic N, Jordan MI (2012) Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1339–1347
Xiong C, Johnson DM, Corso JJ (2017) Active clustering with model-based uncertainty reduction. IEEE Trans Pattern Anal Mach Intell 39(1):5–17
Xu Q, Wagstaff KL, et al (2005) Active constrained clustering by examining spectral eigenvectors. In: International conference on discovery science. Springer, pp 294–307
Yang M, Song J (2010) A novel hypothesis-margin based approach for feature selection with side pairwise constraints. Neurocomputing 73(16):2859–2872
Zelnik-Manor L, Perona P (2005) Self-tuning spectral clustering. In: Advances in neural information processing systems, pp 1601–1608
Zhang D, Chen S, Zhou ZH (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451
Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 641–646
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Advances in neural information processing systems, pp 321–328
Zoidi O, Tefas A, Nikolaidis N, Pitas I (2014) Person identity label propagation in stereo videos. IEEE Trans Multimed 16(5):1358–1368
Acknowledgements
This work was funded in part by the Agence universitaire de la Francophonie (AUF) and the University of the Littoral Opal Coast (ULCO) in France, together with the National Council For Scientific Research in Lebanon as a part of ARCUS E2D2 project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hijazi, S., Hamad, D., Kalakech, M. et al. Active learning of constraints for weighted feature selection. Adv Data Anal Classif 15, 337–377 (2021). https://doi.org/10.1007/s11634-020-00408-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-020-00408-5
Keywords
- Feature selection
- Active learning
- Pairwise constraint selection
- Constraint propagation
- Graph Laplacian
- Uncertainty reduction
- Matrix perturbation