Skip to main content
Log in

Rough set methods in feature selection via submodular function

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Attribute reduction is an important problem in data mining and machine learning in that it can highlight favorable features and decrease the risk of over-fitting to improve the learning performance. With this regard, rough sets offer interesting opportunities for this problem. Reduct in rough sets is a subspace of attributes/features which are jointly sufficient and individually necessary to satisfy a certain criterion. Excessive attributes may reduce diversity and increase correlation among features, a lower number of attributes may also receive nearly equal to or even higher classification accuracy in some specific classifiers, which motivates us to address dimensionality reduction problems with attribute reduction from the joint viewpoint of the learning performance and the reduct size. In this paper, we propose a new attribute reduction criterion to select lowest attributes while keeping the best performance of the corresponding learning algorithms to some extent. The main contributions of this work are twofold. First, we define the concept of k-approximate-reduct, instead of the limitation to minimum reduct, which provides an important view to reveal the connection between the size of attribute reduct and the learning performance. Second, a greedy algorithm for attribute reduction problems based on mutual information is developed, and submodular functions are used to analyze its convergence. By the property of diminishing return of the submodularity, there is a solid guarantee for the reasonability of the k-approximate-reduct. It is noted that rough sets serve as an effective tool to evaluate both the marginal and joint probability distributions among attributes in mutual information. Extensive experiments in six real-world public datasets from machine learning repository demonstrate that the selected subset by mutual information reduct comes with higher accuracy with less number of attributes when developing classifiers naive Bayes and radial basis function network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/.

References

  • Bach F (2011) Learning with submodular functions: a convex optimization perspective. arXiv:1111.6453 [csLG]

  • Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: 16th ACM SIGKDD conference on knowledge discovery and data mining (KDD’10)

  • Chen S, Cowan CF, Grant PM (1991) Orthogonal least squares learning algorithm for radial basis function networks. Neural Netw IEEE Trans 2(2):302–309

    Article  Google Scholar 

  • Chen D, Zhang L, Zhao S, Hu Q, Zhu P (2012) A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans Fuzzy Syst 20(2):385–389

  • Diao R, Shen Q (2012) Feature selection with harmony search. IEEE Trans Syst Man Cybern Part B Cybern 42(6):1509–1523

    Article  Google Scholar 

  • Golub G, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer Math 14(5):403–420

    Article  MathSciNet  MATH  Google Scholar 

  • Gray RM (2011) Entropy and information theory. Springer

  • Gu Q, Li Z, Han J (2012) Generalized fisher score for featureselection. arXiv:1202.3725

  • Han D, Kim J (2015) Unsupervised simultaneous orthogonal basis clustering feature selection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5016–5023

  • He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514

  • Hu Q, Liu J, Yu D (2007) Mixed feature selection based on granulation and approximation. Knowl Based Syst 21(4):294–304

    Article  Google Scholar 

  • Hu Q, Yu D, Liu J, Wu C (2008a) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    Article  MathSciNet  MATH  Google Scholar 

  • Hu Q, Yu D, Xie Z (2008a) Numerical attribute reduction based on neighborhood granulation and rough approximation. J Softw (in Chinese) 19(3):640–649

    Article  MATH  Google Scholar 

  • Hu Q, Yu D, Pedrycz W, Chen D (2011) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 23(11):1649–1667

    Article  Google Scholar 

  • Jolliffe I (ed) (1986) Principal component analysis. Springer

  • Krause A, Golovin D (2012) Submodular function maximization. Tractability Pract Approaches Hard Prob 3:1–28

  • Krause A, Guestrin C (2012) Near-optimal nonmyopic value of information in graphical models. arXiv:1207.1394 [csAI]

  • Kumar R, Moseley B, Vassilvitskii S, Vattani A (2013) Fast greedy algorithms in mapreduce and streaming. In: Proceedings of the 25th ACM symposium on parallelism in algorithms and architectures, pp 1–10

  • Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  • Ma R, Barzigar N, Roozgard A, Cheng S (2014) Decomposition approach for low-rank matrix completion and its applications. Signal Process IEEE Trans 62(7):1671–1683

    Article  MathSciNet  Google Scholar 

  • Min F, He H, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181:4928–4942

    Article  Google Scholar 

  • Min F, Zhu W (2012) Attribute reduction of data with error ranges and test costs. Inf Sci 211:48–67

    Article  MathSciNet  MATH  Google Scholar 

  • Mitchell TM (1997) Machine learning. WCB

  • Nemhauser G, Wolsey L, Fisher M (1978) An analysis of approximations for maximizing submodular set functions. Math Program 14:265–294

    Article  MathSciNet  MATH  Google Scholar 

  • Oxley JG (1993) Matroid theory. Oxford University Press, New York

    MATH  Google Scholar 

  • Parthalain NM, Shen Q, Jensen R (2010) A distance measure approach to exploring the rough set boundary region for attribute reduction. IEEE Trans Knowl Data Eng 22(3):305–317

    Article  Google Scholar 

  • Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356

  • Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Boston

    Book  MATH  Google Scholar 

  • Pedrycz W (1993) Fuzzy control and fuzzy systems. Research Studies Press, Taunton

    MATH  Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Pattern Anal Mach Intell IEEE Trans 27(8):1226–1238

  • Qian Y, Liang J, Dang C (2010a) Incomplete multigranulation rough set. IEEE Trans Syst Man Cybern Part A Syst Hum 40(2):420–431

    Article  Google Scholar 

  • Qian Y, Liang J, Pedrycz W, Dang C (2010a) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618

    Article  MathSciNet  MATH  Google Scholar 

  • Stobbe P (2013) Convex analysis for minimizing and learning submodular set functions. PhD thesis, California Institute of Technology

  • Tsang ECC, Yeung DS, Wang XZ (2003) Offss: optimal fuzzy-valued feature subset selection. IEEE Trans Fuzzy Syst 11(2):202–213

    Article  Google Scholar 

  • Wang H, Bell D, Murtagh F (1999) Axiomatic approach to feature subset selection based on relevance. IEEE Trans Pattern Anal Mach Intell 21(3):271–276

    Article  Google Scholar 

  • Wang G, Yu H, Yang D (2002) Decision table reduction based on conditional information entropy. Chin J Comput 2(7):759–766

    MathSciNet  Google Scholar 

  • Wang S, Pedrycz W, Zhu Q, Zhu W (2015) Subspace learning for unsupervised feature selection via matrix factorization. Pattern Recogn 48(1):10–19

    Article  Google Scholar 

  • Wei K, Liu Y, Kirchhoff K, Bilmes J (2013) Using document summarization techniques for speech data subset selection. In: Proceedings of NAACL-HLT, pp 721–726

  • Wei L, Zhao X, Zhou X (2015) An enhanced entropy-k-nearest neighbor algorithm based on attribute reduction. In: Proceedings of the 4th international conference on computer engineering and networks, Springer, pp 381–388

  • Xu F, Miao D, Wei L (2009) Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl 57(6):1010–1017

    Article  MATH  Google Scholar 

  • Yao Y (2004) A partition model of granular computing. Trans Rough Sets I 3100:232–253

    Article  MATH  Google Scholar 

  • Zhao Y, Lou F, Wong S, Yao Y (2007) A general definition of an attribute reduct. Rough Sets Knowl Technol LNAI 4481:101–108

    Article  Google Scholar 

  • Zhen Z, Zeng X, Wang H, Han L (2011) A global evaluation criterion for feature selection in text categorization using kullback-leibler divergence. In: Soft computing and pattern recognition (SoCPaR), 2011 international conference of, IEEE, pp 440–445

  • Zhu W, Wang F (2007) On three types of covering rough sets. IEEE Trans Knowl Data Eng 19(8):1131–1144

    Article  Google Scholar 

  • Ziarko W (2015) Dependency analysis and attribute reduction in the probabilistic approach to rough sets. In: Feature selection for data and pattern recognition, Springer, pp 93–111

Download references

Acknowledgments

This study was funded by the National Natural Science Foundation of China (Grant No. 61379049).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Zhu.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest.

Human and animal studies

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, XZ., Zhu, W. & Fan, XN. Rough set methods in feature selection via submodular function. Soft Comput 21, 3699–3711 (2017). https://doi.org/10.1007/s00500-015-2024-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-2024-7

Keywords

Navigation