Skip to main content
Log in

A group incremental approach for feature selection on hybrid data

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Feature selection for dynamic data sets has been perceived as a very significant hot research problem in data mining. In practice, most real-world data usually are hybrid, which means both include categorical data and numerical data. For dynamic hybrid data, this paper first introduces a new neighborhood relation and information entropy based on neighborhood accordingly. Secondly, the single incremental mechanism and group incremental mechanism are analyzed and proofed to construct feature significance. On this basis, two incremental approaches to feature selection are developed for dealing with hybrid data. To better demonstrate the new algorithm, four common classifiers and twelve UCI data sets are introduced in the experiments. The experimental results further validate the feasibility of the incremental algorithms, and especially the efficiency of the group incremental algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

  • Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305

    Article  MathSciNet  MATH  Google Scholar 

  • Benabdeslem K, Hindawi M (2004) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26:1131–1143

    Article  Google Scholar 

  • Chen HM, Li TR, Fan X, Luo C (2019) Feature selection for imbalanced data based on neighborhood rough sets. Inf Sci 483:1–20

    Article  Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156

    Article  Google Scholar 

  • Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176

    Article  MathSciNet  MATH  Google Scholar 

  • Gama J (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1(1):45–55

    Article  Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable feature selection. Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  • Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning 359-366

  • Hu QH, Xie ZX, Yu DR (2007) Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recogn 40:3509–3521

    Article  MATH  Google Scholar 

  • Hu QH, Yu DR, Xie ZX (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423

    Article  Google Scholar 

  • Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178:3577–3594

    Article  MathSciNet  MATH  Google Scholar 

  • Huang QQ, Li TR, Huang YY, Yang X (2020) Incremental three-way neighborhood approach for dynamic incomplete hybrid data. Inf Sci 541:98–122

    Article  MathSciNet  MATH  Google Scholar 

  • Jing YG, Li TR, Fujita H et al (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inf Sci 411:23–38

    Article  MathSciNet  Google Scholar 

  • Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671

    Article  Google Scholar 

  • Liu H, Hussain F, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6(4):393C423

    Article  MathSciNet  Google Scholar 

  • Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Article  MathSciNet  Google Scholar 

  • Li TR, Ruan D, Geert W, Song J, Xu Y (2007) A rough sets based characteristic relation approach for dynamic attribute generalization in data mining. Knowl-Based Syst 20(5):485–494

    Article  Google Scholar 

  • Liu D, Li TR, Ruan D, Zou WL (2009) An incremental approach for inducing knowledge from dynamic information systems. Fundamenta Inform 94:245C260

    Article  MathSciNet  MATH  Google Scholar 

  • Liang JY, Wang F, Dang CY, Qian YH (2014) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–308

    Article  Google Scholar 

  • Lu J, Liu AJ, Dong F, Gu F et al (2019) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346–2363

    Google Scholar 

  • Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53:912–926

    Article  MathSciNet  Google Scholar 

  • Liang JY, Chin KS, Dang CY, Yam Richid CMA (2002) new method for measuring uncertainty and fuzziness in rough set theory. Int J Gen Syst 31(4):331–342

    Article  MathSciNet  MATH  Google Scholar 

  • Liu JH, Lin YJ, Li YW et al (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recogn 84:273–287

    Article  Google Scholar 

  • Mera C, Alzate MO, Branch J (2019) Incremental learning of concept drift in multiple instance learning for industrial visual inspection. Comput Ind 109:153–164

    Article  Google Scholar 

  • Neto AF, Canuto AMP (2021) EOCD: an ensemble optimization approach for concept drift applications. Inf Sci 561:81–100

    Article  MathSciNet  Google Scholar 

  • Pedrycz W, Vukovich G (2002) Feature analysis through information granulation and fuzzy sets. Pattern Recogn 35:825–834

    Article  MATH  Google Scholar 

  • Paul J, Ambrosio RD, Dupont P (2015) Kernel methods for heterogeneous feature selection. Neurocomputing 169:187–195

    Article  Google Scholar 

  • Pawlak Z (1998) Rough set theory and its applications in data analysis. Cybern Syst 29:661–688

    Article  MATH  Google Scholar 

  • Pawlak Z, Skowron A (2007) Rough sets and boolean reasoning. Inf Sci 177(1):41–73

    Article  MathSciNet  MATH  Google Scholar 

  • Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recogn Lett 24:833–849

    Article  MATH  Google Scholar 

  • Shu WH, Qian WB, Xie YH et al (2019) Incremental approaches for feature selection from dynamic data with the variation of multiple objects. Knowl-Based Syst 163:320–331

    Article  Google Scholar 

  • Shao MW, Zhang WX (2005) Dominance relation and rules in an incomplete ordered information system. Int J Intell Syst 20:13–27

    Article  MATH  Google Scholar 

  • Tang WY, Mao KZ (2007) Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recogn Lett 28(5):563–571

    Article  MathSciNet  Google Scholar 

  • Chang CZ, Shi YP, Fan XD, Shao MW (2019) Attribute reduction based on k-nearest neighborhood rough sets. Int J Approx Reason 106:18–31

    Article  MathSciNet  MATH  Google Scholar 

  • Wang F, Liang JY, Qian YH (2013) Attribute reduction: a dimension incremental strategy. Knowl-Based Syst 39:95–108

    Article  Google Scholar 

  • Wang F, Liang JY, Dang CY (2013) Attribute reduction for dynamic data sets. Appl Soft Comput 13:676–689

    Article  Google Scholar 

  • Wang F, Liang JY (2016) An efficient feature selection algorithm for hybrid data. Neurocomputing 193:33–41

    Article  Google Scholar 

  • Wang H (2006) Nearest neighbors by neighborhood counting. IEEE Trans Pattern Anal Mach Intell 28(6):942–953

    Article  Google Scholar 

  • Wei W, Liang JY, Qian YH, Wang F (2009) An attribute reduction approach and its accelerated version for hybrid data. In: The 8th IEEE International Conference on Cognitive Informatics 167-173

  • Wu WZ, Mi JS, Zhang WX (2003) Generalized fuzzy rough sets. Inf Sci 151:263–282

    Article  MathSciNet  MATH  Google Scholar 

  • Xu WH, Zhang XY, Zhang WX (2009) Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems. Appl Soft Comput 9(4):1244–1251

    Article  Google Scholar 

  • Yang X, Liu D, Yang XB, Liu KY, Li TR (2021) Incremental fuzzy probability decision-theoretic approaches to dynamic three-way approximations. Inf Sci 550:71–90

    Article  MathSciNet  Google Scholar 

  • Yao YY (2006) Neighborhood systems and approximate retrieval. Inf Sci 176(23):3431–3452

    Article  MathSciNet  MATH  Google Scholar 

  • Yao YY, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373

    Article  MathSciNet  MATH  Google Scholar 

  • Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  • Zhao H, Qin KY (2014) Mixed feature selection in incomplete decision table. Knowl Based Syst 57:181–190

    Article  Google Scholar 

  • Zhou P, Hu XG, Li PP, Wu XD (2019) Online streaming feature selection using adapted neighborhood rough set. Inf Sci 481:258–279

    Article  Google Scholar 

Download references

Funding

This study was funded by the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi Province, China (No. 2016111), Applied Basic Research Programs of Shanxi Province (No. 201801D221170), Research Project Supported by Shanxi Scholarship Council of China(No. 2021-007) and National Natural Science Fund of China (No. 61876103).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiye Liang.

Ethics declarations

Conflict of interest

The authors certify that there is no conflict of interest with any individual/organization for the present work.

Human and animal participants

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

In addition, informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Wei, W. & Liang, J. A group incremental approach for feature selection on hybrid data. Soft Comput 26, 3663–3677 (2022). https://doi.org/10.1007/s00500-022-06838-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-06838-x

Keywords

Navigation