Abstract
In recent years, outlier detection has attracted considerable attention. The identification of outliers is important for many applications, including those related to intrusion detection, credit card fraud, criminal activity in electronic commerce, medical diagnosis and anti-terrorism. Various outlier detection methods have been proposed for solving problems in different domains. In this paper, a new outlier detection method is proposed from the perspectives of granular computing (GrC) and rough set theory. First, we give a definition of outliers called GR(GrC and rough sets)-based outliers. Second, to detect GR-based outliers, an outlier detection algorithm called ODGrCR is proposed. Third, the effectiveness of ODGrCR is evaluated by using a number of real data sets. The experimental results show that our algorithm is effective for outlier detection. In particular, our algorithm takes much less running time than other outlier detection methods.
Similar content being viewed by others
Notes
The resultant data set is public available at: http://research.cmis.csiro.au/rohanb/outliers/breast-cancer/
References
Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD International Conference on Managment of Data, California, pp 37–46
Albanese A, Pal SK, Petrosino A (2014) Rough Sets, Kernel Set, and Spatiotemporal Outlier Detection. IEEE Trans Knowl Data Eng 26(1):194–207
Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: Proceedings of the Sixth European Conference on the Principles of Data Mining and Knowledge Discovery, pp 15–26
Barnett V, Lewis T (1994) Outliers in Statistical Data. John Wiley & Sons, New York
Bay SD (1999) The UCI KDD repository. Available online at: http://www.kdd.ics.uci.edu
Bolton RJ, Hand DJ (2002) Statistical fraud detection: A review (with discussion). Statist Sci 17(3):235–255
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: Identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, pp 93–104
Chen YM, Miao DQ, Wang RZ (2008) Outlier detection based on granular computing. In: Proceedings of the 6th international conference on rough sets and current trends in computing, Akron, pp 283–292
Chen YM, Miao DQ, Zhang HY (2010) Neighborhood outlier detection. Expert Syst Appl 37(12):8745–8749
Duan QG, Miao DQ, Wang RZ, Chen M (2007) An approach to web page classification based on granules. In: Proceedings of IEEE/WIC/ACM international conference on web intelligence, Silicon Valley, pp 279–282
Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S (2002) A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Barbar D, et al. (eds) Data Mining for Security Applications. Kluwer Academic Publishers, Boston, pp 1–20
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Conference on Artificial Intelligence, pp 1022–1027
Ganter B, Wille R (1999) Formal Concept Analysis: mathematical foundations. Springer-Verlag, Berlin
Han JW, Kamber M, Pei J (2011) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco
Harkins S, He HX, Willams GJ, Baxter RA (2002) Outlier detection using replicator neural networks. In: Proceedings of the 4th international conference on data warehousing and knowledge discovery, France, pp 170–180
Hawkins D (1980) Identifications of Outliers. Chapman and Hall, London
He ZY, Deng SC, Xu XF (2005) An optimization model for outlier detection in categorical data. In: Proceedings of the international conference on intelligent computing (ICIC(1)), Hefei, pp 400– 409
Hu QH, Xie ZX, Yu DR (2007) Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recogn 40(12):3509–3521
Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Jiang F, Sui YF, Cao CG (2005) Outlier detection using rough set theory. In: Proceedings of the 10th international conference on rough sets, fuzzy sets, data mining, and granular computing (RSFDGrC (2)). LNAI 3642, Regina, pp 79–87
Jiang F, Sui YF, Cao CG (2008) A rough set approach to outlier detection. Int J Gen Syst 37(5):519–536
Jiang F, Sui YF, Cao CG (2011) A hybrid approach to outlier detection based on boundary region. Pattern Recogn Lett 32(14):1860–1870
Johnson T, Kwok I, Ng RT (1998) Fast computation of 2-dimensional depth contours. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, pp 224–228
Kent RE (1996) Rough concept analysis: a synthesis of rough sets and formal concept analysis. Fundamenta Informaticae 27(2):169–181
Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th VLDB Conference, New York, pp 392–403
Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB Journal: Very Large Data bases 8(3-4):237–253
Lane T, Brodley CE (1999) Temporal sequence learning and data reduction for anomaly detection. ACM Trans. Inform Syst Security 2(3):295–331
Liang JY, Wang JH, Qian YH (2009) A new measure of uncertainty based on knowledge granulation for rough sets. Inf Sci 179(4):458–470
Lin TY (1997) Granular computing. Announcement of the BISC special interest group on granular computing
Lin TY (1998) Granular computing on binary relations I: data mining and neighborhood systems, II: rough set representations and belief functions. In: Skowron A, Polkowski L (eds) Rough sets in knowledge discovery. Physica-Verlag, Heidelberg, pp 107–140
Lin TY (2000) Data Mining and Machine Oriented Modeling: A Granular Computing Approach. Appl Intell 13(2):113–124
Lin TY, Louie E (2002) Finding association rules by granular computing: fast algorithms for finding association rules. In: Proceedings of the 12th international conference on data mining, rough sets and granular computing, Berlin, pp 23–42
Miao DQ, Wang GY, Liu Q, Lin TY, Yao YY (2007) Granular computing: past, present and future prospect. Science Press, Beijing
Miao DQ, Chen M, Wei ZH, Duan QG (2007) A reasonable rough approximation of clustering web users. In: Proceedings of the WICI international workshop on web intelligence meets brain informatics, LNCS 4845, pp 428–442
Nguyen SH, Nguyen HS (1996) Some efficient algorithms for rough set methods. In: IPMU’96, Granada, pp 1451–1456
Nguyen TT (2007) Outlier Detection: An Approximate Reasoning Approach. In: Proceedings of the International Conference on Rough Sets and Intelligent Systems Paradigms, pp 495–504
Pagliani P (1993) From concept lattices to approximation spaces: algebraic structures of some spaces of partial objects. Fundamenta Informaticae 18:1–25
Pal SK, Meher SK, Dutta S (2012) Class-dependent rough-fuzzy granular space, dispersion index and classification. Pattern Recogn 45(7):2690–2707
Pawlak Z (1982) Rough sets. Internat J Comput Inform Sci 11:341–356
Pawlak Z (1991) Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht
Pawlak Z (1998) Granularity of knowledge, indiscernibility and rough sets. Proceedings of IEEE international conference on fuzzy systems, Anchorage, pp 106–110
Pedrycz W, Vukovich G (2001) Granular neural networks. Neurocomputing 36(1-4):205–224
Pedrycz W, Vukovich G (2002) Feature analysis through information granulation and fuzzy sets. Pattern Recognit 35(4):825–834
Polkowski L, Skowron A (1998) Towards adaptive calculus of granules. In: Proceedings of IEEE international conference on fuzzy systems, Anchorage, pp 111–116
Qian YH, Liang JY, Dang CY (2010) Incomplete multigranulation rough set. IEEE Trans. Syst. Man Cybern. Part A 40(2):420–431
Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: An accelerator for attribute reduction in rough set theory. Artif Intell 174(9-10):597–618
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large datasets. In: Proceedings of the ACM SIGMOD conference on management of data, Dallas, pp 427– 438
Shaari F, Bakar AA, Hamdan AR (2009) Outlier detection based on rough sets theory. Intell Data Anal 13(2):191– 206
Skowron A, Stepaniuk J (1999) Towards discovery of information granules. In: Proceedings of the 3rd European conference on principles and practice of knowledge discovery in databases, LNAI 1704. Springer-Verlag, Berlin Heidelberg New York, pp 542– 547
Wang GY (2001) Rough set theory and knowledge acquisition. Xian Jiaotong University Press, Xian
Wang CZ, Chen DG, Wu C, Hu QH (2011) Data compression with homomorphism in covering information systems. Internat J Approx Reason 52(4):519–525
Wille R (1982) Restructuring Lattice theory: An Approach Based on Hierarchies of Concepts. Ordered Sets, Reidtel, D, Dordrecht, pp 445–470
Wu WZ, Leung Y, Mi JS (2009) Granular computing and knowledge reduction in formal contexts. IEEE Trans Knowl Data Eng 21(10):1461–1474
Xu ZY, Liu ZP, Yang BR, Song W (2006) A quick attribute reduction algorithm with complexity of max(O(|C||U|), O(|C|2|U/C|)). Chin J Comput 29(3):391–399
Xue ZX, Liu SY (2009) Rough-Based Semi-supervised Outlier Detection. In: Proceedings of the 6th internatonal conference on fuzzy systems and knowledge discovery, vol 1, pp 520–523
Yao YY (1999) Granular computing using neighborhood systems. In: Roy R, Furuhashi T, Chawdhry PK (eds) Advances in Soft Computing: Engineering Design and Manufacturing. Springer-Verlag, London, pp 539–553
Yao YY, Zhong N (2002) Granular computing using information tables. In: Lin TY, Yao YY, Zadeh LA (eds) Data Mining, Rough Sets and Granular Computing. Physica-Verlag, Berlin Heidelberg New York, pp 102–124
Yao YY (2004) A Comparative Study of Formal Concept Analysis and Rough Set Theory in Data Analysis. In: Proceedings of the 4th international conference on rough sets and current trends in computing, LNAI 3066. Springer, Berlin Heidelberg New York, pp 59–68
Yao YY (2006) Granular computing for data mining. In: Dasarathy B.V (ed) Proceedings of SPIE conference on data mining, intrusion detection, information assurance, and data networks security, pp 1–12
Ye MQ, Wu XD, Hu XG, Hu DH (2013) Multi-level rough set reduction for decision rule mining. Appl Intell 39(3):642– 658
Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta N, Ragade R, Yager R (eds) Advances in fuzzy set theory and applications, North-Holland, pp 3–18
Zadeh LA (1997) Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90(2):111–127
Zadeh LA (1998) Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems. Soft Comput 2(1):23– 25
Zhang B, Zhang L (1992) Theory and Applications of Problem Solving. Elsevier Science Publishers B V, North-Holland
Acknowledgments
This work is supported by the National Natural Science Foundation of China (grant nos. 60802042, 61273180), the Natural Science Foundation of Shandong Province, China (grant no. ZR2011FQ005), and the Project of Shandong Province Higher Educational Science and Technology Program (grant no. J11LG05).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, F., Chen, YM. Outlier detection based on granular computing and rough set theory. Appl Intell 42, 303–322 (2015). https://doi.org/10.1007/s10489-014-0591-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0591-4