Abstract
In data mining application, the test-cost-sensitive attribute reduction is an important task which aims to decrease the test cost of data. In operational research, the set cover problem is a typical optimization problem and has a long investigation history compared to the attribute reduction problem. In this paper, we employ the methods of set cover problem to deal with the test-cost-sensitive attribute reduction. First, we equivalently transform the test-cost-sensitive reduction problem into the set cover problem by using a constructive approach. It is shown that computing a reduct of a decision system with minimal test cost is equal to computing an optimal solution of the set cover problem. Then, a set-cover-based heuristic algorithm is introduced to solve the test-cost-sensitive reduction problem. In the end, we conduct several numerical experiments on data sets from UCI machine learning repository. Experimental results indicate that the set-cover-based algorithm has superior performances in most cases, and the algorithm is efficient on data sets with many attributes.
Similar content being viewed by others
References
Bolón-Canedo V, Porto-Díaz I, Sánchez-Maroño N, Alonso-Betanzos A (2014) A framework for cost-based feature selection. Pattern Recogn 47:2481–2489
Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66
Caprara A, Toth P, Fischetti M (2000) Algorithms for the set covering problem. Ann Oper Res 98:353–371
Chen CY, Li ZG (2004) A study of reduction of attributes and set covering problem. Comput Eng Appl 2:1–14
Chen DG, Zhao SY, Zhang L, Yang YP, Zhang X (2012) Sample pair selection for attribute reduction with rough set. IEEE Trans Knowl Data Eng 24:2080–2093
Chen JK, Lin YJ, Lin GP, Li JJ, Ma ZM (2015) The relationship between attribute reducts in rough sets and minimal vertex covers of graphs. Inf Sci 325:87–97
Chvatal V (1979) A greedy-heuristic for the set covering problem. Math Oper Res 4:233–235
Fan AJ, Zhao H, Zhu W (2015) Test-cost-sensitive attribute reduction on heterogeneous data for adaptive neighborhood model. Soft Comput. doi:10.1007/s00500-015-1770-x
Gao C, Yao X, Weise T, Li JL (2015) An efficient local search heuristic with row weighting for the unicost set covering problem. Eur J Oper Res 246:750–761
Hu QH, Pan WW, Zhang L, Zhang D, Song YP, Guo MZ, Yu DR (2012) Feature selection for monotonic classification. IEEE Trans Fuzzy Syst 20(1):69–81
Jia XY, Liao WH, Tang ZM, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167
Jing SY (2014) A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput 18(7):1373–1382
Kusunoki Y, Inuiguchi M (2010) A unified approach to reducts in dominance-based rough set approach. Soft Comput 14(5):507–515
Lavrac N, Gamberger D, Turney P (1996) Cost-sensitive feature reduction applied to a hybrid genetic algorithm. In: Proceedings of the 7th international workshop on algorithmic learning theory, ALT
Liang JY, Shi ZZ (2004) The information entropy, rough entropy and knowledge granulation in rough set theory. Int J Uncertain Fuzziness Knowl Based Syst 12:37–46
Liu JNK, Hua YX, He YL (2014) A set covering based approach to find the reduct of variable precision rough set. Inf Sci 275:83–100
Mi JS, Leung Y, Wu WZ (2011) Dependence-space-based attribute reduction in consistent decision tables. Soft Comput 15:261–268
Miao DQ, Zhao Y, Yao YY, Li H, Xu F (2009) Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inf Sci 179(24):4140–4150
Min F, Liu QH (2009) A hierarchical model for test-cost-sensitive decision systems. Inf Sci 179:2442–2452
Min F, He HP, Qian YH, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181:4928–4942
Min F, Zhu W (2012) Attribute reduction of data with error ranges and test costs. Inf Sci 211:48–67
Min F, Hu QH, Zhu W (2014) Feature selection with test cost constraint. Int J Approx Reason 55:167–179
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Dordrecht
Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174:597–618
Qian YH, Liang JY, Dang CY (2010) Incomplete multigranulation rough set. IEEE Trans Syst Man Cybern A 20:420–431
Qian YH, Wang Q, Cheng HH, Liang JY, Dang CY (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61–78
Quan GR, Hong BR, Ye F, Ren SJ (1998) A heuristic function algorithm for minimum set-covering problem. J Softw 9:156–160
Skowron A, Rauszer C (1992) The discernibility matrices and functions in information systems. In: Slowinski R (ed) Intelligent decision support, theory and decision library, vol 11. Springer, Netherlands, pp 331–362
Slavík P (1996) A tight analysis of the greedy algorithm for set cover. In: Proceedings of the 28th annual ACM symposium on theory of computing, STOC ’96, ACM, pp 435–441
Slezak D (2002) Approximate entropy reducts. Fundam Informat 53:365–390
Xu YT, Wang LS, Zhang RY (2011) A dynamic attribute reduction algorithm based on 0–1 integer programming. Knowl-Based Syst 24:1341–1347
Yang XB, Qi YS, Song XN, Yang JY (2013) Test cost sensitive multigranulation rough set: model and minimal cost selection. Inf Sci 250:184–199
Yao YY, Zhao Y (2009) Discernibility matrix simplification for constructing attribute reducts. Inf Sci 179:867–882
Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl-Based Syst 65:72–82
Acknowledgments
This work is supported by Grants from National Natural Science Foundation of China (Nos. 61573321, 61272021, 61202206 and 61173181), Zhejiang Provincial Natural Science Foundation of China (Nos. LZ12F03002, LY14F030001), Open Foundation from Marine Sciences in the Most Important Subjects of Zhejiang (No. 20130109), and Scientific Research Start-up Fund of Zhejiang Ocean University (No. 21065014715).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Author Anhui Tan declares that he has no conflict of interest. Author Weizhi Wu declares that he has no conflict of interest. Author Yuzhi Tao declares that she has no conflict of interest.
Ethical standard
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Tan, A., Wu, W. & Tao, Y. A set-cover-based approach for the test-cost-sensitive attribute reduction problem. Soft Comput 21, 6159–6173 (2017). https://doi.org/10.1007/s00500-016-2173-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2173-3