Skip to main content
Log in

A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Databases obtained from different search engines, market data, patients’ symptoms and behaviours, etc., are some common examples of set-valued data, in which a set of values are correlated with a single entity. In real-world data deluge, various irrelevant attributes lower the ability of experts both in speed and in predictive accuracy due to high dimension and insignificant information, respectively. Attribute selection is the concept of selecting those attributes that ideally are necessary as well as sufficient to better describe the target knowledge. Rough set-based approaches can handle uncertainty available in the real-valued information systems after the discretization process. In this paper, we introduce a novel approach for attribute selection in set-valued information system based on tolerance rough set theory. The fuzzy tolerance relation between two objects using a similarity threshold is defined. We find reducts based on the degree of dependency method for selecting best subsets of attributes in order to obtain higher knowledge from the information system. Analogous results of rough set theory are established in case of the proposed method for validation. Moreover, we present a greedy algorithm along with some illustrative examples to clearly demonstrate our approach without checking for each pair of attributes in set-valued decision systems. Examples for calculating reduct of an incomplete information system are also given by using the proposed approach. Comparisons are performed between the proposed approach and fuzzy rough-assisted attribute selection on a real benchmark dataset as well as with three existing approaches for attribute selection on six real benchmark datasets to show the supremacy of proposed work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Blake CL (1998) UCI Repository of machine learning databases, Irvine, University of California. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed 1 Feb 2019

  • Dai J (2013) Rough set approach to incomplete numerical data. Inf Sci 241:43–57

    MathSciNet  MATH  Google Scholar 

  • Dai J, Tian H (2013) Fuzzy rough set model for set-valued data. Fuzzy Sets Syst 229:54–68

    MathSciNet  MATH  Google Scholar 

  • Dai J, Xu Q (2012) Approximations and uncertainty measures in incomplete information systems. Inf Sci 198:62–80

    MathSciNet  MATH  Google Scholar 

  • Dai J, Wang W, Tian H, Liu L (2013) Attribute selection based on a new conditional entropy for incomplete decision systems. Knowl-Based Syst 39:207–213

    Google Scholar 

  • Dubois D, Prade H (1992) Putting rough sets and fuzzy sets together. In: Słowiński R (ed) Intelligent decision support. Springer, Dordrecht, pp 203–232

    Google Scholar 

  • Guan YY, Wang HK (2006) Set-valued information systems. Inf Sci 176(17):2507–2525

    MathSciNet  MATH  Google Scholar 

  • Hall M (1999) Correlation-based feature selection for machine learning. PhD Thesis, Department of Computer Science, Waikato University, New Zealand

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18

    Google Scholar 

  • He Y, Naughton JF (2009) Anonymization of set-valued data via top-down, local generalization. Proc VLDB Endow 2(1):934–945

    Google Scholar 

  • Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    MathSciNet  MATH  Google Scholar 

  • Huang SY (ed) (1992) Intelligent decision support: handbook of applications and advances of the rough sets theory, vol 11. Springer, Berlin

    Google Scholar 

  • Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838

    Google Scholar 

  • Jensen R, Cornelis C, Shen Q. (2009) Hybrid fuzzy-rough rule induction and feature selection. In: FUZZ-IEEE 2009, IEEE international conference on fuzzy systems, 2009. IEEE, pp. 1151–1156

  • Kryszkiewicz M (1998) Rough set approach to incomplete information systems. Inf Sci 112(1–4):39–49

    MathSciNet  MATH  Google Scholar 

  • Kryszkiewicz M (1999) Rules in incomplete information systems. Inf Sci 113(3–4):271–292

    MathSciNet  MATH  Google Scholar 

  • Lang G, Li Q, Yang T (2014) An incremental approach to attribute reduction of dynamic set-valued information systems. Int J Mach Learn Cybern 5(5):775–788

    Google Scholar 

  • Leung Y, Li D (2003) Maximal consistent block technique for rule acquisition in incomplete information systems. Inf Sci 153:85–106

    MathSciNet  MATH  Google Scholar 

  • Lipski W Jr (1979) On semantic issues connected with incomplete information databases. ACM Trans Database Syst (TODS) 4(3):262–296

    Google Scholar 

  • Lipski W Jr (1981) On databases with incomplete information. J ACM (JACM) 28(1):41–70

    MathSciNet  MATH  Google Scholar 

  • Luo C, Li T, Chen H, Liu D (2013) Incremental approaches for updating approximations in set-valued ordered information systems. Knowl-Based Syst 50:218–233

    Google Scholar 

  • Luo C, Li T, Chen H (2014) Dynamic maintenance of approximations in set-valued ordered decision systems under the attribute generalization. Inf Sci 257:210–228

    MathSciNet  MATH  Google Scholar 

  • Luo C, Li T, Chen H, Lu L (2015) Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values. Inf Sci 299:221–242

    MathSciNet  MATH  Google Scholar 

  • Orłowska E (1985) Logic of nondeterministic information. Stud Logica 44(1):91–100

    MathSciNet  MATH  Google Scholar 

  • Orłowska E, Pawlak Z (1984) Representation of nondeterministic information. Theor Comput Sci 29(1–2):27–39

    MathSciNet  MATH  Google Scholar 

  • Pawlak Z (1991) Rough Sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht

    MATH  Google Scholar 

  • Pawlak Z, Skowron A (2007a) Rough sets and Boolean reasoning. Inf Sci 177(1):41–73

    MathSciNet  MATH  Google Scholar 

  • Pawlak Z, Skowron A (2007b) Rough sets: some extensions. Inf Sci 177(1):28–40

    MathSciNet  MATH  Google Scholar 

  • Pawlak Z, Skowron A (2007c) Rudiments of rough sets. Inf Sci 177(1):3–27

    MathSciNet  MATH  Google Scholar 

  • Qian Y, Dang C, Liang J, Tang D (2009) Set-valued ordered information systems. Inf Sci 179(16):2809–2832

    MathSciNet  MATH  Google Scholar 

  • Qian Y, Liang J, Pedrycz W, Dang C (2010a) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618

    MathSciNet  MATH  Google Scholar 

  • Qian YH, Liang JY, Song P, Dang CY (2010b) On dominance relations in disjunctive set-valued ordered information systems. Int J Inf Technol Decis Mak 9(01):9–33

    MATH  Google Scholar 

  • Qian J, Miao DQ, Zhang ZH, Li W (2011) Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int J Approx Reason 52(2):212–230

    MathSciNet  MATH  Google Scholar 

  • Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69

    MATH  Google Scholar 

  • Shi Y, Yao L, Xu J (2011) A probability maximization model based on rough approximation and its application to the inventory problem. Int J Approx Reason 52(2):261–280

    MathSciNet  MATH  Google Scholar 

  • Shoemaker CA, Ruiz C (2003) Association rule mining algorithms for set-valued data. In: International conference on intelligent data engineering and automated learning, Springer, Berlin, pp. 669–676

    Google Scholar 

  • Shu W, Qian W (2014) Mutual information-based feature selection from set-valued data. In: 26th IEEE international conference on tools with artificial intelligence (ICTAI), 2014, IEEE, pp. 733–739

  • Wang H, Yue HB, Chen XE (2013) Attribute reduction in interval and set-valued decision information systems. Appl. Math. 4(11):1512

    Google Scholar 

  • Data sets in articles. http://www.yuhuaqian.com

  • Yang T, Li Q (2010) Reduction about approximation spaces of covering generalized rough sets. Int J Approx Reason 51(3):335–345

    MathSciNet  MATH  Google Scholar 

  • Yang QS, Wang GY, Zhang QH, MA XA (2010) Disjunctive set-valued ordered information systems based on variable precision dominance relation. J. Guangxi Normal Univ Nat Sci Ed 3:84–88

    Google Scholar 

  • Yang X, Zhang M, Dou H, Yang J (2011) Neighborhood systems-based rough sets in incomplete information system. Knowl Based Syst 24(6):858–867

    Google Scholar 

  • Yang X, Song X, Chen Z, Yang J (2012) On multigranulation rough sets in incomplete information system. Int J Mach Learn Cybern 3(3):223–232

    Google Scholar 

  • Yao YY (2001) Information granulation and rough set approximation. Int J Intell Syst 16(1):87–104

    MATH  Google Scholar 

  • Yao YY, Liu Q (1999) A generalized decision logic in interval-set-valued information tables. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing, Springer, Berlin, pp. 285–293

    Google Scholar 

  • Zadeh LA (1996) Fuzzy sets. In: Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers by Lotfi A Zadeh, pp. 394–432

    Google Scholar 

  • Zhang J, Li T, Ruan D, Liu D (2012) Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems. Int J Approx Reason 53(4):620–635

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shivam Shreevastava.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Research involving human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, S., Shreevastava, S., Som, T. et al. A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems. Soft Comput 24, 4675–4691 (2020). https://doi.org/10.1007/s00500-019-04228-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04228-4

Keywords

Navigation