Skip to main content

Advertisement

Log in

Outlier detection based on granular computing and rough set theory

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, outlier detection has attracted considerable attention. The identification of outliers is important for many applications, including those related to intrusion detection, credit card fraud, criminal activity in electronic commerce, medical diagnosis and anti-terrorism. Various outlier detection methods have been proposed for solving problems in different domains. In this paper, a new outlier detection method is proposed from the perspectives of granular computing (GrC) and rough set theory. First, we give a definition of outliers called GR(GrC and rough sets)-based outliers. Second, to detect GR-based outliers, an outlier detection algorithm called ODGrCR is proposed. Third, the effectiveness of ODGrCR is evaluated by using a number of real data sets. The experimental results show that our algorithm is effective for outlier detection. In particular, our algorithm takes much less running time than other outlier detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The resultant data set is public available at: http://research.cmis.csiro.au/rohanb/outliers/breast-cancer/

References

  1. Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD International Conference on Managment of Data, California, pp 37–46

  2. Albanese A, Pal SK, Petrosino A (2014) Rough Sets, Kernel Set, and Spatiotemporal Outlier Detection. IEEE Trans Knowl Data Eng 26(1):194–207

    Article  Google Scholar 

  3. Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: Proceedings of the Sixth European Conference on the Principles of Data Mining and Knowledge Discovery, pp 15–26

  4. Barnett V, Lewis T (1994) Outliers in Statistical Data. John Wiley & Sons, New York

    MATH  Google Scholar 

  5. Bay SD (1999) The UCI KDD repository. Available online at: http://www.kdd.ics.uci.edu

  6. Bolton RJ, Hand DJ (2002) Statistical fraud detection: A review (with discussion). Statist Sci 17(3):235–255

    Article  MATH  MathSciNet  Google Scholar 

  7. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: Identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, pp 93–104

  8. Chen YM, Miao DQ, Wang RZ (2008) Outlier detection based on granular computing. In: Proceedings of the 6th international conference on rough sets and current trends in computing, Akron, pp 283–292

  9. Chen YM, Miao DQ, Zhang HY (2010) Neighborhood outlier detection. Expert Syst Appl 37(12):8745–8749

    Article  Google Scholar 

  10. Duan QG, Miao DQ, Wang RZ, Chen M (2007) An approach to web page classification based on granules. In: Proceedings of IEEE/WIC/ACM international conference on web intelligence, Silicon Valley, pp 279–282

  11. Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S (2002) A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Barbar D, et al. (eds) Data Mining for Security Applications. Kluwer Academic Publishers, Boston, pp 1–20

  12. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Conference on Artificial Intelligence, pp 1022–1027

  13. Ganter B, Wille R (1999) Formal Concept Analysis: mathematical foundations. Springer-Verlag, Berlin

  14. Han JW, Kamber M, Pei J (2011) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  15. Harkins S, He HX, Willams GJ, Baxter RA (2002) Outlier detection using replicator neural networks. In: Proceedings of the 4th international conference on data warehousing and knowledge discovery, France, pp 170–180

  16. Hawkins D (1980) Identifications of Outliers. Chapman and Hall, London

    Book  Google Scholar 

  17. He ZY, Deng SC, Xu XF (2005) An optimization model for outlier detection in categorical data. In: Proceedings of the international conference on intelligent computing (ICIC(1)), Hefei, pp 400– 409

  18. Hu QH, Xie ZX, Yu DR (2007) Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recogn 40(12):3509–3521

    Article  MATH  Google Scholar 

  19. Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    Article  MATH  MathSciNet  Google Scholar 

  20. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  21. Jiang F, Sui YF, Cao CG (2005) Outlier detection using rough set theory. In: Proceedings of the 10th international conference on rough sets, fuzzy sets, data mining, and granular computing (RSFDGrC (2)). LNAI 3642, Regina, pp 79–87

  22. Jiang F, Sui YF, Cao CG (2008) A rough set approach to outlier detection. Int J Gen Syst 37(5):519–536

    Article  MATH  Google Scholar 

  23. Jiang F, Sui YF, Cao CG (2011) A hybrid approach to outlier detection based on boundary region. Pattern Recogn Lett 32(14):1860–1870

    Article  Google Scholar 

  24. Johnson T, Kwok I, Ng RT (1998) Fast computation of 2-dimensional depth contours. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, pp 224–228

  25. Kent RE (1996) Rough concept analysis: a synthesis of rough sets and formal concept analysis. Fundamenta Informaticae 27(2):169–181

  26. Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th VLDB Conference, New York, pp 392–403

  27. Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB Journal: Very Large Data bases 8(3-4):237–253

    Article  Google Scholar 

  28. Lane T, Brodley CE (1999) Temporal sequence learning and data reduction for anomaly detection. ACM Trans. Inform Syst Security 2(3):295–331

    Article  Google Scholar 

  29. Liang JY, Wang JH, Qian YH (2009) A new measure of uncertainty based on knowledge granulation for rough sets. Inf Sci 179(4):458–470

    Article  MATH  MathSciNet  Google Scholar 

  30. Lin TY (1997) Granular computing. Announcement of the BISC special interest group on granular computing

  31. Lin TY (1998) Granular computing on binary relations I: data mining and neighborhood systems, II: rough set representations and belief functions. In: Skowron A, Polkowski L (eds) Rough sets in knowledge discovery. Physica-Verlag, Heidelberg, pp 107–140

    Google Scholar 

  32. Lin TY (2000) Data Mining and Machine Oriented Modeling: A Granular Computing Approach. Appl Intell 13(2):113–124

    Article  Google Scholar 

  33. Lin TY, Louie E (2002) Finding association rules by granular computing: fast algorithms for finding association rules. In: Proceedings of the 12th international conference on data mining, rough sets and granular computing, Berlin, pp 23–42

  34. Miao DQ, Wang GY, Liu Q, Lin TY, Yao YY (2007) Granular computing: past, present and future prospect. Science Press, Beijing

    Google Scholar 

  35. Miao DQ, Chen M, Wei ZH, Duan QG (2007) A reasonable rough approximation of clustering web users. In: Proceedings of the WICI international workshop on web intelligence meets brain informatics, LNCS 4845, pp 428–442

    Google Scholar 

  36. Nguyen SH, Nguyen HS (1996) Some efficient algorithms for rough set methods. In: IPMU’96, Granada, pp 1451–1456

  37. Nguyen TT (2007) Outlier Detection: An Approximate Reasoning Approach. In: Proceedings of the International Conference on Rough Sets and Intelligent Systems Paradigms, pp 495–504

    Chapter  Google Scholar 

  38. Pagliani P (1993) From concept lattices to approximation spaces: algebraic structures of some spaces of partial objects. Fundamenta Informaticae 18:1–25

    MATH  MathSciNet  Google Scholar 

  39. Pal SK, Meher SK, Dutta S (2012) Class-dependent rough-fuzzy granular space, dispersion index and classification. Pattern Recogn 45(7):2690–2707

    Article  Google Scholar 

  40. Pawlak Z (1982) Rough sets. Internat J Comput Inform Sci 11:341–356

    Article  MATH  MathSciNet  Google Scholar 

  41. Pawlak Z (1991) Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht

    Book  MATH  Google Scholar 

  42. Pawlak Z (1998) Granularity of knowledge, indiscernibility and rough sets. Proceedings of IEEE international conference on fuzzy systems, Anchorage, pp 106–110

  43. Pedrycz W, Vukovich G (2001) Granular neural networks. Neurocomputing 36(1-4):205–224

    Article  MATH  Google Scholar 

  44. Pedrycz W, Vukovich G (2002) Feature analysis through information granulation and fuzzy sets. Pattern Recognit 35(4):825–834

    Article  MATH  Google Scholar 

  45. Polkowski L, Skowron A (1998) Towards adaptive calculus of granules. In: Proceedings of IEEE international conference on fuzzy systems, Anchorage, pp 111–116

  46. Qian YH, Liang JY, Dang CY (2010) Incomplete multigranulation rough set. IEEE Trans. Syst. Man Cybern. Part A 40(2):420–431

    Article  Google Scholar 

  47. Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: An accelerator for attribute reduction in rough set theory. Artif Intell 174(9-10):597–618

    Article  MATH  MathSciNet  Google Scholar 

  48. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large datasets. In: Proceedings of the ACM SIGMOD conference on management of data, Dallas, pp 427– 438

  49. Shaari F, Bakar AA, Hamdan AR (2009) Outlier detection based on rough sets theory. Intell Data Anal 13(2):191– 206

    Google Scholar 

  50. Skowron A, Stepaniuk J (1999) Towards discovery of information granules. In: Proceedings of the 3rd European conference on principles and practice of knowledge discovery in databases, LNAI 1704. Springer-Verlag, Berlin Heidelberg New York, pp 542– 547

  51. Wang GY (2001) Rough set theory and knowledge acquisition. Xian Jiaotong University Press, Xian

    Google Scholar 

  52. Wang CZ, Chen DG, Wu C, Hu QH (2011) Data compression with homomorphism in covering information systems. Internat J Approx Reason 52(4):519–525

    Article  MATH  MathSciNet  Google Scholar 

  53. Wille R (1982) Restructuring Lattice theory: An Approach Based on Hierarchies of Concepts. Ordered Sets, Reidtel, D, Dordrecht, pp 445–470

  54. Wu WZ, Leung Y, Mi JS (2009) Granular computing and knowledge reduction in formal contexts. IEEE Trans Knowl Data Eng 21(10):1461–1474

    Article  Google Scholar 

  55. Xu ZY, Liu ZP, Yang BR, Song W (2006) A quick attribute reduction algorithm with complexity of max(O(|C||U|), O(|C|2|U/C|)). Chin J Comput 29(3):391–399

    Google Scholar 

  56. Xue ZX, Liu SY (2009) Rough-Based Semi-supervised Outlier Detection. In: Proceedings of the 6th internatonal conference on fuzzy systems and knowledge discovery, vol 1, pp 520–523

  57. Yao YY (1999) Granular computing using neighborhood systems. In: Roy R, Furuhashi T, Chawdhry PK (eds) Advances in Soft Computing: Engineering Design and Manufacturing. Springer-Verlag, London, pp 539–553

    Chapter  Google Scholar 

  58. Yao YY, Zhong N (2002) Granular computing using information tables. In: Lin TY, Yao YY, Zadeh LA (eds) Data Mining, Rough Sets and Granular Computing. Physica-Verlag, Berlin Heidelberg New York, pp 102–124

    Chapter  Google Scholar 

  59. Yao YY (2004) A Comparative Study of Formal Concept Analysis and Rough Set Theory in Data Analysis. In: Proceedings of the 4th international conference on rough sets and current trends in computing, LNAI 3066. Springer, Berlin Heidelberg New York, pp 59–68

  60. Yao YY (2006) Granular computing for data mining. In: Dasarathy B.V (ed) Proceedings of SPIE conference on data mining, intrusion detection, information assurance, and data networks security, pp 1–12

    Google Scholar 

  61. Ye MQ, Wu XD, Hu XG, Hu DH (2013) Multi-level rough set reduction for decision rule mining. Appl Intell 39(3):642– 658

    Article  Google Scholar 

  62. Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta N, Ragade R, Yager R (eds) Advances in fuzzy set theory and applications, North-Holland, pp 3–18

  63. Zadeh LA (1997) Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90(2):111–127

    Article  MATH  MathSciNet  Google Scholar 

  64. Zadeh LA (1998) Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems. Soft Comput 2(1):23– 25

    Article  Google Scholar 

  65. Zhang B, Zhang L (1992) Theory and Applications of Problem Solving. Elsevier Science Publishers B V, North-Holland

    MATH  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (grant nos. 60802042, 61273180), the Natural Science Foundation of Shandong Province, China (grant no. ZR2011FQ005), and the Project of Shandong Province Higher Educational Science and Technology Program (grant no. J11LG05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Jiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, F., Chen, YM. Outlier detection based on granular computing and rough set theory. Appl Intell 42, 303–322 (2015). https://doi.org/10.1007/s10489-014-0591-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0591-4

Keywords

Navigation