Abstract
The classification ability in unseen objects, namely generalization ability, remains a long-standing challenge in rough set-based classifier. Current research mainly focuses on introducing thresholds to tolerate some errors in seen objects. The reason for introducing thresholds and the selection of threshold still lack sufficient theoretical support. The structural risk minimization (SRM) inductive principle is one of the most effective theories to control the generalization ability, which suggests a trade-off between errors in seen objects and complexity. Therefore, this paper introduces the SRM principle into rough set-based classifier and proposes SRM algorithm of rough set-based classifier called SRM-R algorithm. SRM-R algorithm uses the number of rules to characterize the actual complexity of rough set-based classifier and obtains the optimal trade-off between errors in seen objects and complexity through genetic multi-objective optimization. The tenfold cross-validation experiment in 12 UCI datasets shows SRM-R algorithm can significantly improve the generalization ability compared with conventional threshold algorithm. Besides, this paper uses other two possible complexity metrics including the number of attributes and attribute space to construct corresponding SRM algorithms, respectively, and compared their classification accuracy with that of SRM-R algorithm. Comparison result shows SRM-R algorithm obtains optimal classification accuracy. This indicates that the number of rules characterizes the complexity more effectively than the number of attributes and attribute space. Further experiments show that SRM-R algorithm obtains fewer rules and larger support coefficient, which means it extracts stronger rules. This explains why it obtains better generalization ability to some extent.
Similar content being viewed by others
References
Abualigah LMQ (2019) Feature selection and enhanced Krill Herd algorithm for text document clustering. Springer, Berlin
Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5:19
Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:4773–4795
Abualigah LM, Khader AT, Hanandeh ES (2018a) A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125
Abualigah LM, Khader AT, Hanandeh ES (2018b) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48:4047–4071
Abualigah LM, Khader AT, Hanandeh ES (2018c) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466
Barman T, Ghongade R, Ratnaparkhi A (2016) Rough set based segmentation and classification model for ECG. In: 2016 conference on advances in signal processing (CASP). IEEE, pp 18–23
Bazan JG, Nguyen HS, Nguyen SH, Synak P, Wróblewski J (2000) Rough set algorithms in classification problem. In: Polkowski L, Tsumoto S, Lin TY (eds) Rough set methods and applications. Studies in fuzziness and soft computing, vol 56. Physica, Heidelberg
Carlos F et al (2016) Regularization techniques for ECG imaging during atrial fibrillation: a computational study. Front Physiol 7:466
Cekik R, Telceken S (2016) A new classification method based on rough sets theory. Soft Comput 22:1881–1889. https://doi.org/10.1007/s00500-016-2443-0
Chen YM, Xue Y, Ma Y, Xu FF (2017) Measures of uncertainty for neighborhood rough sets. Knowl Based Syst 120:226–235. https://doi.org/10.1016/j.knosys.2017.01.008
Cheng YS, Zhan WF, Wu XD, Zhang YZ (2015) Automatic determination about precision parameter value based on inclusion degree with variable precision rough set model. Inf Sci 290:72–85. https://doi.org/10.1016/j.ins.2014.08.034
Coello CA (1998) An updated survey of GA-based multiobjective optimization techniques. In: ACM computing surveys, pp 109–143
Das RT, Ang KK, Quek C (2016) ieRSPOP: a novel incremental rough set-based pseudo outer-product with ensemble learning. Appl Soft Comput 46:170–186. https://doi.org/10.1016/j.asoc.2016.04.015
Derrac J, Garcia S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1:3–18. https://doi.org/10.1016/j.swevo.2011.02.002
Fang H, Wang Q, Tu YC, Horstemeyer MF (2008) An efficient non-dominated sorting method for evolutionary algorithms. Evol Comput 16:355–384
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701. https://doi.org/10.1080/01621459.1937.10503522
Grzymala-Busse JW (1992) LERS-a system for learning from examples based on rough sets. In: Slowinski R (ed) Intelligent decision support. Theory and decision library (Series D: System theory, knowledge engineering and problem solving), vol 11. Springer, Dordrecht
Halder B, Mitra S, Mitra M (2019) Classification of complete myocardial infarction using rule-based rough set method and rough set explorer system. IETE J Res 1–11. https://doi.org/10.1080/03772063.2019.1588175
Hedar AR, Omar MA, Sewisy AA (2015) Rough sets attribute reduction using an accelerated genetic algorithm. In: IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing, pp 1–7
Holland H (1975) Adaption in natural and artificial systems. Q Rev Biol 6:126–137
Hong-Wei Y, Xindi T (2016) Based on rough sets and L1 regularization of the fault diagnosis of linear regression model. In: Paper presented at the 2016 international conference on intelligent transportation, big data and smart city (ICITBS)
Jeon G, Anisetti M, Damiani E, Monga O (2018) Real-time image processing systems using fuzzy and rough sets techniques. Soft Comput 22:1381–1384. https://doi.org/10.1007/s00500-017-2999-3
Jia X, Shang L, Zhou B, Yao Y (2016) Generalized attribute reduct in rough set theory. Knowl Based Syst 91:204–218. https://doi.org/10.1016/j.knosys.2015.05.017
Jiang Y, Yu Y (2016) Minimal attribute reduction with rough set based on compactness discernibility information tree. Soft Comput 20:2233–2243. https://doi.org/10.1007/s00500-015-1638-0
Kadzinski M, Slowinski R, Greco S (2015) Multiple criteria ranking and choice with all compatible minimal cover sets of decision rules. Knowl Based Syst 89:569–583. https://doi.org/10.1016/j.knosys.2015.09.004
Kim DE (2006) Minimizing structural risk on decision tree classification. Springer, Berlin
Liu J, Hu Q, Yu D (2007) Weighted rough set learning: towards a subjective approach. In: Pacific-Asia conference on advances in knowledge discovery and data mining, pp 696–703
Liu JF, Hu QH, Yu DR (2008) A weighted rough set based method developed for class imbalance learning. Inf Sci 178:1235–1256. https://doi.org/10.1016/j.ins.2007.10.002
Liu D, Qian H, Dai G, Zhang Z (2013) An iterative SVM approach to feature selection and classification in high-dimensional datasets. Pattern Recognit 46:2531–2537. https://doi.org/10.1016/j.patcog.2013.02.007
Luo J, Wei C, Dai H, Yuan J (2018) Robust LS-SVM-based adaptive constrained control for a class of uncertain nonlinear systems with time-varying predefined performance. Commun Nonlinear Sci Numer Simul 56:561–587. https://doi.org/10.1016/j.cnsns.2017.09.004
Ma BT, Xia Y (2017) A tribe competition-based genetic algorithm for feature selection in pattern classification. Appl Soft Comput 58:328–338. https://doi.org/10.1016/j.asoc.2017.04.042
Min F, Du X, Qiu H, Liu Q (2007) Minimal attribute space bias for attribute reduction. In: Rough sets and knowledge technology, second international conference, RSKT 2007, Toronto, Canada, May 14–16, 2007, Proceedings, pp 379–386
Nong J (2011) The Design of RBF Neural Networks and experimentation for solving overfitting problem. In: International conference on electronics and optoelectronics, pp V1-75–V71-78
Nyathi T, Pillay N (2017) Automated design of genetic programming classification algorithms using a genetic algorithm. In: Squillero G, Sim K (eds) Applications of evolutionary computation, vol 10200. Lecture notes in computer science. Springer, Cham, pp 224–239. https://doi.org/10.1007/978-3-319-55792-2_15
Pareek NK, Patidar V (2016) Medical image protection using genetic algorithm operations. Soft Comput 20:763–772. https://doi.org/10.1007/s00500-014-1539-7
Pawlak Z (2002) Rough sets and intelligent data analysis. Inf Sci 147:1–12. https://doi.org/10.1016/s0020-0255(02)00197-4
Pawlak Z, Skowron A (2007) Rudiments of rough sets. Inf Sci 177:3–27. https://doi.org/10.1016/j.ins.2006.06.003
Queiroga E, Subramanian A, dos Anjos F, Cabral L (2018) Continuous greedy randomized adaptive search procedure for data clustering. Appl Soft Comput 72:43–55. https://doi.org/10.1016/j.asoc.2018.07.031
Rissanen J (1978) Modeling by shortest data description. Automatica 14:465–471. https://doi.org/10.1016/0005-1098(78)90005-5
Sahoo S, Jha MK (2017) Pattern recognition in lithology classification: modeling using neural networks, self-organizing maps and genetic algorithms. Hydrogeol J 25:311–330. https://doi.org/10.1007/s10040-016-1478-8
Valsecchi A, Damas S, Santamaria J, IEEE (2012) An image registration approach using genetic algorithms. In: 2012 IEEE congress on evolutionary computation
Sheta A, Braik MS, Aljahdali S (2012) Genetic algorithms: a tool for image segmentation. In: Essaaidi M, Zaz Y (eds) 2012 international conference on multimedia computing and systems, pp 83–89
Stefanowski J (1998) On rough set based approaches to induction of decision rules. Rough Sets Knowl Discov 1:500–529
Teng S, Liao F, Ma Y, He M, Nian Y (2017) Uncertainty measures of rough sets based on discernibility capability in information systems. Soft Comput 21:1081–1096. https://doi.org/10.1007/s00500-016-2481-7
Vapnik V (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10:988–999
Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin
Vapnik V, Chervonenkis A (1974) Theory of pattern recognition. Nauka, Moscow (in Russian)
Vieira DAG, Vasconcelos JA, Saldanha RR (2010) Recent advances in neural networks structural risk minimization based on multiobjective complexity control algorithms. InTech
Wang Z, Chu L (2010) The algorithm of text classification based on rough set and support vector machine. In: International conference on future computer and communication, pp V1-365–V361-368
Wang Z-M, Han N, Yuan Z-M, Wu Z-H (2013) Feature selection for high-dimensional data based on ridge regression and SVM and its application in peptide QSAR modeling. Acta Phys Chim Sin 29:498–507. https://doi.org/10.3866/pku.whxb201301042
Wang CZ, Shao MW, He Q, Qian YH, Qi YL (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111:173–179. https://doi.org/10.1016/j.knosys.2016.08:009
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1:80–83
Xu WH, Guo YT (2016) Generalized multigranulation double-quantitative decision-theoretic rough set. Knowl Based Syst 105:190–205. https://doi.org/10.1016/j.knosys.2016.05.021
Yang ZM, Chai Y, Chen T, Qu JF (2017) Smoothed l(1)-regularization-based line search for sparse signal recovery. Soft Comput 21:4813–4828. https://doi.org/10.1007/s00500-016-2423-4
Ye D, Chen Z (2015) A new approach to minimum attribute reduction based on discrete artificial bee colony. Soft Comput 19:1893–1903. https://doi.org/10.1007/s00500-014-1371-0
Yildiz OT (2015) VC-dimension of univariate decision trees. IEEE Trans Neural Netw Learn Syst 26:378
Zhan J, Ali MI, Mehmood N (2017) On a novel uncertain soft set model: Z-soft fuzzy rough set model and corresponding decision making methods. Appl Soft Comput 56:446–457. https://doi.org/10.1016/j.asoc.2017.03.038
Zhao XR, Hu BQ (2016) Fuzzy probabilistic rough sets and their corresponding three-way decisions. Knowl Based Syst 91:126–142. https://doi.org/10.1016/j.knosys.2015.09.018
Zhao J, Zhang Z, Han C, Zhou Z (2015) Complement information entropy for uncertainty measure in fuzzy rough set and its applications. Soft Comput 19:1997–2010. https://doi.org/10.1007/s00500-014-1387-5
Zhao H, Wang P, Hu QH (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149. https://doi.org/10.1016/j.ins.2016.05.025
Zhao W, Xu L, Bai J, Ji M, Runge T (2018) Sensor-based risk perception ability network design for drivers in snow and ice environmental freeway: a deep learning and rough sets approach. Soft Comput 22:1457–1466. https://doi.org/10.1007/s00500-017-2850-x
Zheng L, Diao R, Shen Q (2015) Self-adjusting harmony search-based feature selection. Soft Comput 19:1567–1579. https://doi.org/10.1007/s00500-014-1307-8
Zhou J, Miao D, Feng Q, Sun L (2009) Research on complete algorithms for minimal attribute reduction. In: Rough sets and knowledge technology, international conference, RSKT 2009, Gold Coast, Australia, July 14–16, 2009. Proceedings, pp 152–159
Zhou P, Hu XG, Li PP, Wu XD (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl Based Syst 136:187–199. https://doi.org/10.1016/j.knosys.2017.09.006
Zhu X-Z, Zhu W, Fan X-N (2017) Rough set methods in feature selection via submodular function. Soft Comput 21:3699–3711. https://doi.org/10.1007/s00500-015-2024-7
Ziarko W (1993) Variable precision rough set model. J Comput Syst Sci 46:39–59
Zitzler E, Thiele L (1998) An evolutionary algorithm for multiobjective optimization: the strength pareto approach
Acknowledgements
This work was supported by National Key R&D Program of China No. 2017YFB0902100 and National Science and Technology Major Project of China No. 2017-I-0007-0008. The authors would like to thank the anonymous reviewers for their careful reading of the paper and valuable suggestions to refine this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interests regarding the publication of this article.
Human and animal rights statement
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, J., Bai, M., Jiang, N. et al. Structural risk minimization of rough set-based classifier. Soft Comput 24, 2049–2066 (2020). https://doi.org/10.1007/s00500-019-04038-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04038-8