Abstract
Feature selection for dynamic data sets has been perceived as a very significant hot research problem in data mining. In practice, most real-world data usually are hybrid, which means both include categorical data and numerical data. For dynamic hybrid data, this paper first introduces a new neighborhood relation and information entropy based on neighborhood accordingly. Secondly, the single incremental mechanism and group incremental mechanism are analyzed and proofed to construct feature significance. On this basis, two incremental approaches to feature selection are developed for dealing with hybrid data. To better demonstrate the new algorithm, four common classifiers and twelve UCI data sets are introduced in the experiments. The experimental results further validate the feasibility of the incremental algorithms, and especially the efficiency of the group incremental algorithm.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
References
Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305
Benabdeslem K, Hindawi M (2004) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26:1131–1143
Chen HM, Li TR, Fan X, Luo C (2019) Feature selection for imbalanced data based on neighborhood rough sets. Inf Sci 483:1–20
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176
Gama J (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1(1):45–55
Guyon I, Elisseeff A (2003) An introduction to variable feature selection. Mach Learn Res 3:1157–1182
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning 359-366
Hu QH, Xie ZX, Yu DR (2007) Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recogn 40:3509–3521
Hu QH, Yu DR, Xie ZX (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423
Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178:3577–3594
Huang QQ, Li TR, Huang YY, Yang X (2020) Incremental three-way neighborhood approach for dynamic incomplete hybrid data. Inf Sci 541:98–122
Jing YG, Li TR, Fujita H et al (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inf Sci 411:23–38
Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671
Liu H, Hussain F, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6(4):393C423
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Li TR, Ruan D, Geert W, Song J, Xu Y (2007) A rough sets based characteristic relation approach for dynamic attribute generalization in data mining. Knowl-Based Syst 20(5):485–494
Liu D, Li TR, Ruan D, Zou WL (2009) An incremental approach for inducing knowledge from dynamic information systems. Fundamenta Inform 94:245C260
Liang JY, Wang F, Dang CY, Qian YH (2014) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–308
Lu J, Liu AJ, Dong F, Gu F et al (2019) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346–2363
Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53:912–926
Liang JY, Chin KS, Dang CY, Yam Richid CMA (2002) new method for measuring uncertainty and fuzziness in rough set theory. Int J Gen Syst 31(4):331–342
Liu JH, Lin YJ, Li YW et al (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recogn 84:273–287
Mera C, Alzate MO, Branch J (2019) Incremental learning of concept drift in multiple instance learning for industrial visual inspection. Comput Ind 109:153–164
Neto AF, Canuto AMP (2021) EOCD: an ensemble optimization approach for concept drift applications. Inf Sci 561:81–100
Pedrycz W, Vukovich G (2002) Feature analysis through information granulation and fuzzy sets. Pattern Recogn 35:825–834
Paul J, Ambrosio RD, Dupont P (2015) Kernel methods for heterogeneous feature selection. Neurocomputing 169:187–195
Pawlak Z (1998) Rough set theory and its applications in data analysis. Cybern Syst 29:661–688
Pawlak Z, Skowron A (2007) Rough sets and boolean reasoning. Inf Sci 177(1):41–73
Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recogn Lett 24:833–849
Shu WH, Qian WB, Xie YH et al (2019) Incremental approaches for feature selection from dynamic data with the variation of multiple objects. Knowl-Based Syst 163:320–331
Shao MW, Zhang WX (2005) Dominance relation and rules in an incomplete ordered information system. Int J Intell Syst 20:13–27
Tang WY, Mao KZ (2007) Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recogn Lett 28(5):563–571
Chang CZ, Shi YP, Fan XD, Shao MW (2019) Attribute reduction based on k-nearest neighborhood rough sets. Int J Approx Reason 106:18–31
Wang F, Liang JY, Qian YH (2013) Attribute reduction: a dimension incremental strategy. Knowl-Based Syst 39:95–108
Wang F, Liang JY, Dang CY (2013) Attribute reduction for dynamic data sets. Appl Soft Comput 13:676–689
Wang F, Liang JY (2016) An efficient feature selection algorithm for hybrid data. Neurocomputing 193:33–41
Wang H (2006) Nearest neighbors by neighborhood counting. IEEE Trans Pattern Anal Mach Intell 28(6):942–953
Wei W, Liang JY, Qian YH, Wang F (2009) An attribute reduction approach and its accelerated version for hybrid data. In: The 8th IEEE International Conference on Cognitive Informatics 167-173
Wu WZ, Mi JS, Zhang WX (2003) Generalized fuzzy rough sets. Inf Sci 151:263–282
Xu WH, Zhang XY, Zhang WX (2009) Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems. Appl Soft Comput 9(4):1244–1251
Yang X, Liu D, Yang XB, Liu KY, Li TR (2021) Incremental fuzzy probability decision-theoretic approaches to dynamic three-way approximations. Inf Sci 550:71–90
Yao YY (2006) Neighborhood systems and approximate retrieval. Inf Sci 176(23):3431–3452
Yao YY, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Zhao H, Qin KY (2014) Mixed feature selection in incomplete decision table. Knowl Based Syst 57:181–190
Zhou P, Hu XG, Li PP, Wu XD (2019) Online streaming feature selection using adapted neighborhood rough set. Inf Sci 481:258–279
Funding
This study was funded by the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi Province, China (No. 2016111), Applied Basic Research Programs of Shanxi Province (No. 201801D221170), Research Project Supported by Shanxi Scholarship Council of China(No. 2021-007) and National Natural Science Fund of China (No. 61876103).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors certify that there is no conflict of interest with any individual/organization for the present work.
Human and animal participants
This article does not contain any studies with human participants performed by any of the authors.
Informed consent
In addition, informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, F., Wei, W. & Liang, J. A group incremental approach for feature selection on hybrid data. Soft Comput 26, 3663–3677 (2022). https://doi.org/10.1007/s00500-022-06838-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-06838-x