Detecting outliers in categorical data through rough clustering

Suri, N. N. R. Ranga; Murty, M. Narasimha; Athithan, G.

doi:10.1007/s11047-015-9489-2

Detecting outliers in categorical data through rough clustering

Published: 08 February 2015

Volume 15, pages 385–394, (2016)
Cite this article

Natural Computing Aims and scope Submit manuscript

N. N. R. Ranga Suri¹,
M. Narasimha Murty² &
G. Athithan³

903 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

Outlier detection is an important data mining task with many contemporary applications. Clustering based methods for outlier detection try to identify the data objects that deviate from the normal data. However, the uncertainty regarding the cluster membership of an outlier object has to be handled appropriately during the clustering process. Additionally, carrying out the clustering process on data described using categorical attributes is challenging, due to the difficulty in defining requisite methods and measures dealing with such data. Addressing these issues, a novel algorithm for clustering categorical data aimed at outlier detection is proposed here by modifying the standard \(k\)-modes algorithm. The uncertainty regarding the clustering process is addressed by considering a soft computing approach based on rough sets. Accordingly, the modified clustering algorithm incorporates the lower and upper approximation properties of rough sets. The efficacy of the proposed rough \(k\)-modes clustering algorithm for outlier detection is demonstrated using various benchmark categorical data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Albanese A, Pal SK, Petrosino A (2014) Rough sets, kernel set, and spatio-temporal outlier detection. IEEE Trans Knowl Data Eng 26(1):194–207
Article Google Scholar
Asharaf S, Murty MN, Shevade SK (2006) Rough set based incremental clustering of interval data. Pattern Recogn Lett 27:515–519
Article Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. URL http://archive.ics.uci.edu/ml
Bock HH (2002) The classical data situation. In: Analysis of Symbolic Data. Springer, Berlin, pp 139–152
Cao F, Liang J, Bai L (2009) A new initialization method for categorical data clustering. Expert Syst Appl 36:10223–10228
Article Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3)
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874
Article Google Scholar
Huang Z (1997) A fast clustering algorithm to cluster very large categorical data sets in data mining. In: SIGMOD DMKD Workshop, pp 1–8
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Jiang F, Sui Y, Cao C (2009) Some issues about outlier detection in rough set theory. Expert Syst Appl 36:4680–4687
Article Google Scholar
Joshi M, Lingras P (2013) Enhancing rough clustering with outlier detection based on evidential clustering. RSFDGrC, Springer, LNCS 8170, pp 127–137
Google Scholar
Lai JZC, Juan EYT, Lai FJC (2013) Rough clustering using generalized fuzzy clustering algorithm. Pattern Recogn 46:2538–2547
Article Google Scholar
Li M, Deng S, Wang L, Feng S, Fan J (2014) Hierarchical clustering algorithm for categorical data using a probabilistic rough set model. Knowl-Based Syst 65:60–71
Article Google Scholar
Lingras P (2002) Rough set clustering for web mining. In: IEEE FUZZ, pp 1039–1044
Lingras P, Peters G (2012) Applying rough set concepts to clustering. Rough Sets: selected methods and applications in management and engineering. Springer, London, pp 23–38
Lingras P, West C (2004) Interval set clustering of web users with rough k-means. J Intell Inform Syst 23(1):5–16
Article MATH Google Scholar
Maji P, Pal SK (2008) RFCM: a hybrid algorithm using rough and fuzzy sets. Fundam Inform 80(4):475–496
MathSciNet MATH Google Scholar
Maji P, Pal SK (2010) Fuzzy-rough sets for information measures and selection of relevant genes from microarray data. IEEE Trans Syst Man Cybern Part B 40(3):741–752
Article Google Scholar
Maji P, Paul S (2013) Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans Comput Biol Bioinform 10(2):286
Article Google Scholar
Masson M, Denoeux T (2008) ECM: An evidential version of the fuzzy c-means algorithm. Pattern Recogn 41:1384–1397
Article MATH Google Scholar
Mi H (2011) Discovering local outlier based on rough clustering. In: 3rd International workshop on intelligent systems and applications (ISA), IEEE, pp 1–4
Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507
Article Google Scholar
Nguyen HS, Pal SK, Skowron A (2011) Rough sets and fuzzy sets in natural computing. Theor Comput Sci 412(42):5816–5819
Article MathSciNet Google Scholar
Obtulowicz A (2003) Mathematical models of uncertainty with a regard to membrane systems. Nat Comput 2(3):251–263
Article MathSciNet MATH Google Scholar
Parmer D, Wu T, Blackhurst J (2007) MMR: an algorithm for clustering categorical data using rough set theory. Data Knowl Eng 63:879–893
Article Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11:341–356
Article MathSciNet MATH Google Scholar
Peters G (2006) Some refinements of rough k-means clustering. Pattern Recogn 39:1481–1491
Article MATH Google Scholar
Peters G (2014) Is there any need for rough clustering? Pattern Recognition Letters online. doi:10.1016/j.patrec.2014.11.003
Google Scholar
Skowron A, Jankowski A, Swiniarski RW (2013) 30 years of rough sets and future perspectives. In: RSFDGrC, Springer, Halifax, Canada, LNCS 8170, pp 1–10
Suri NNRR, Murty MN, Athithan G (2011) Data mining techniques for outlier detection, chap 2. In: Zhang Q, Segall RS, Cao M (eds) Visual analytics and interactive technologies: data, text and web mining applications. IGI Global, New York, pp 22–38
Google Scholar
Suri NNRR, Murty MN, Athithan G (2012) An algorithm for mining outliers in categorical data through ranking. In: Proceedings of 12th international conference on hybrid intelligent systems (HIS), IEEE, Pune, India, pp 247–252
Suri NNRR, Murty MN, Athithan G (2013) A rough clustering algorithm for mining outliers in categorical data. In: Proceedings of 4th international conference on pattern recognition and machine intelligence (PReMI), Springer, Kolkata, India, LNCS 8251, pp 170–175

Download references

Author information

Authors and Affiliations

Centre for AI and Robotics, CVR Nagar, Bangalore, India
N. N. R. Ranga Suri
Department of CSA, Indian Institute of Science, Bangalore, India
M. Narasimha Murty
Scientific Analysis Group, Metcalfe House, Delhi, India
G. Athithan

Authors

N. N. R. Ranga Suri
View author publications
You can also search for this author in PubMed Google Scholar
M. Narasimha Murty
View author publications
You can also search for this author in PubMed Google Scholar
G. Athithan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. N. R. Ranga Suri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suri, N.N.R.R., Murty, M.N. & Athithan, G. Detecting outliers in categorical data through rough clustering. Nat Comput 15, 385–394 (2016). https://doi.org/10.1007/s11047-015-9489-2

Download citation

Published: 08 February 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11047-015-9489-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Detecting outliers in categorical data through rough clustering

Abstract

Access this article

Similar content being viewed by others

A Rough Clustering Algorithm for Mining Outliers in Categorical Data

Enhancing Rough Clustering with Outlier Detection Based on Evidential Clustering

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting outliers in categorical data through rough clustering

Abstract

Access this article

Similar content being viewed by others

A Rough Clustering Algorithm for Mining Outliers in Categorical Data

Enhancing Rough Clustering with Outlier Detection Based on Evidential Clustering

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation