A group incremental approach for feature selection on hybrid data

Wang, Feng; Wei, Wei; Liang, Jiye

doi:10.1007/s00500-022-06838-x

A group incremental approach for feature selection on hybrid data

Foundations
Published: 26 February 2022

Volume 26, pages 3663–3677, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

Feng Wang¹,
Wei Wei¹ &
Jiye Liang¹

382 Accesses
5 Citations
Explore all metrics

Abstract

Feature selection for dynamic data sets has been perceived as a very significant hot research problem in data mining. In practice, most real-world data usually are hybrid, which means both include categorical data and numerical data. For dynamic hybrid data, this paper first introduces a new neighborhood relation and information entropy based on neighborhood accordingly. Secondly, the single incremental mechanism and group incremental mechanism are analyzed and proofed to construct feature significance. On this basis, two incremental approaches to feature selection are developed for dealing with hybrid data. To better demonstrate the new algorithm, four common classifiers and twelve UCI data sets are introduced in the experiments. The experimental results further validate the feasibility of the incremental algorithms, and especially the efficiency of the group incremental algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Selection of Classifiers Applied to High-Dimensional Small-Instance Data Sets: Problems and Challenges

Discernible neighborhood counting based incremental feature selection for heterogeneous data

Article 13 August 2019

Dynamic Feature Selection Based on Clustering Algorithm and Individual Similarity

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305
Article MathSciNet MATH Google Scholar
Benabdeslem K, Hindawi M (2004) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26:1131–1143
Article Google Scholar
Chen HM, Li TR, Fan X, Luo C (2019) Feature selection for imbalanced data based on neighborhood rough sets. Inf Sci 483:1–20
Article Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Article Google Scholar
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176
Article MathSciNet MATH Google Scholar
Gama J (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1(1):45–55
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable feature selection. Mach Learn Res 3:1157–1182
MATH Google Scholar
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning 359-366
Hu QH, Xie ZX, Yu DR (2007) Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recogn 40:3509–3521
Article MATH Google Scholar
Hu QH, Yu DR, Xie ZX (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423
Article Google Scholar
Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178:3577–3594
Article MathSciNet MATH Google Scholar
Huang QQ, Li TR, Huang YY, Yang X (2020) Incremental three-way neighborhood approach for dynamic incomplete hybrid data. Inf Sci 541:98–122
Article MathSciNet MATH Google Scholar
Jing YG, Li TR, Fujita H et al (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inf Sci 411:23–38
Article MathSciNet Google Scholar
Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671
Article Google Scholar
Liu H, Hussain F, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6(4):393C423
Article MathSciNet Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Article MathSciNet Google Scholar
Li TR, Ruan D, Geert W, Song J, Xu Y (2007) A rough sets based characteristic relation approach for dynamic attribute generalization in data mining. Knowl-Based Syst 20(5):485–494
Article Google Scholar
Liu D, Li TR, Ruan D, Zou WL (2009) An incremental approach for inducing knowledge from dynamic information systems. Fundamenta Inform 94:245C260
Article MathSciNet MATH Google Scholar
Liang JY, Wang F, Dang CY, Qian YH (2014) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–308
Article Google Scholar
Lu J, Liu AJ, Dong F, Gu F et al (2019) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346–2363
Google Scholar
Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53:912–926
Article MathSciNet Google Scholar
Liang JY, Chin KS, Dang CY, Yam Richid CMA (2002) new method for measuring uncertainty and fuzziness in rough set theory. Int J Gen Syst 31(4):331–342
Article MathSciNet MATH Google Scholar
Liu JH, Lin YJ, Li YW et al (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recogn 84:273–287
Article Google Scholar
Mera C, Alzate MO, Branch J (2019) Incremental learning of concept drift in multiple instance learning for industrial visual inspection. Comput Ind 109:153–164
Article Google Scholar
Neto AF, Canuto AMP (2021) EOCD: an ensemble optimization approach for concept drift applications. Inf Sci 561:81–100
Article MathSciNet Google Scholar
Pedrycz W, Vukovich G (2002) Feature analysis through information granulation and fuzzy sets. Pattern Recogn 35:825–834
Article MATH Google Scholar
Paul J, Ambrosio RD, Dupont P (2015) Kernel methods for heterogeneous feature selection. Neurocomputing 169:187–195
Article Google Scholar
Pawlak Z (1998) Rough set theory and its applications in data analysis. Cybern Syst 29:661–688
Article MATH Google Scholar
Pawlak Z, Skowron A (2007) Rough sets and boolean reasoning. Inf Sci 177(1):41–73
Article MathSciNet MATH Google Scholar
Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recogn Lett 24:833–849
Article MATH Google Scholar
Shu WH, Qian WB, Xie YH et al (2019) Incremental approaches for feature selection from dynamic data with the variation of multiple objects. Knowl-Based Syst 163:320–331
Article Google Scholar
Shao MW, Zhang WX (2005) Dominance relation and rules in an incomplete ordered information system. Int J Intell Syst 20:13–27
Article MATH Google Scholar
Tang WY, Mao KZ (2007) Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recogn Lett 28(5):563–571
Article MathSciNet Google Scholar
Chang CZ, Shi YP, Fan XD, Shao MW (2019) Attribute reduction based on k-nearest neighborhood rough sets. Int J Approx Reason 106:18–31
Article MathSciNet MATH Google Scholar
Wang F, Liang JY, Qian YH (2013) Attribute reduction: a dimension incremental strategy. Knowl-Based Syst 39:95–108
Article Google Scholar
Wang F, Liang JY, Dang CY (2013) Attribute reduction for dynamic data sets. Appl Soft Comput 13:676–689
Article Google Scholar
Wang F, Liang JY (2016) An efficient feature selection algorithm for hybrid data. Neurocomputing 193:33–41
Article Google Scholar
Wang H (2006) Nearest neighbors by neighborhood counting. IEEE Trans Pattern Anal Mach Intell 28(6):942–953
Article Google Scholar
Wei W, Liang JY, Qian YH, Wang F (2009) An attribute reduction approach and its accelerated version for hybrid data. In: The 8th IEEE International Conference on Cognitive Informatics 167-173
Wu WZ, Mi JS, Zhang WX (2003) Generalized fuzzy rough sets. Inf Sci 151:263–282
Article MathSciNet MATH Google Scholar
Xu WH, Zhang XY, Zhang WX (2009) Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems. Appl Soft Comput 9(4):1244–1251
Article Google Scholar
Yang X, Liu D, Yang XB, Liu KY, Li TR (2021) Incremental fuzzy probability decision-theoretic approaches to dynamic three-way approximations. Inf Sci 550:71–90
Article MathSciNet Google Scholar
Yao YY (2006) Neighborhood systems and approximate retrieval. Inf Sci 176(23):3431–3452
Article MathSciNet MATH Google Scholar
Yao YY, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373
Article MathSciNet MATH Google Scholar
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
MathSciNet MATH Google Scholar
Zhao H, Qin KY (2014) Mixed feature selection in incomplete decision table. Knowl Based Syst 57:181–190
Article Google Scholar
Zhou P, Hu XG, Li PP, Wu XD (2019) Online streaming feature selection using adapted neighborhood rough set. Inf Sci 481:258–279
Article Google Scholar

Download references

Funding

This study was funded by the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi Province, China (No. 2016111), Applied Basic Research Programs of Shanxi Province (No. 201801D221170), Research Project Supported by Shanxi Scholarship Council of China(No. 2021-007) and National Natural Science Fund of China (No. 61876103).

Author information

Authors and Affiliations

Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, Shanxi, China
Feng Wang, Wei Wei & Jiye Liang

Authors

Feng Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wei Wei
View author publications
You can also search for this author inPubMed Google Scholar
Jiye Liang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jiye Liang.

Ethics declarations

Conflict of interest

The authors certify that there is no conflict of interest with any individual/organization for the present work.

Human and animal participants

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

In addition, informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Wei, W. & Liang, J. A group incremental approach for feature selection on hybrid data. Soft Comput 26, 3663–3677 (2022). https://doi.org/10.1007/s00500-022-06838-x

Download citation

Accepted: 21 January 2022
Published: 26 February 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00500-022-06838-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A group incremental approach for feature selection on hybrid data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamic Selection of Classifiers Applied to High-Dimensional Small-Instance Data Sets: Problems and Challenges

Discernible neighborhood counting based incremental feature selection for heterogeneous data

Dynamic Feature Selection Based on Clustering Algorithm and Individual Similarity

Explore related subjects

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal participants

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now