SVDD-based outlier detection on uncertain data

Liu, Bo; Xiao, Yanshan; Cao, Longbing; Hao, Zhifeng; Deng, Feiqi

doi:10.1007/s10115-012-0484-y

SVDD-based outlier detection on uncertain data

Regular Paper
Published: 06 May 2012

Volume 34, pages 597–618, (2013)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Bo Liu¹,
Yanshan Xiao²,
Longbing Cao³,
Zhifeng Hao² &
…
Feiqi Deng⁴

1416 Accesses
96 Citations
Explore all metrics

Abstract

Outlier detection is an important problem that has been studied within diverse research areas and application domains. Most existing methods are based on the assumption that an example can be exactly categorized as either a normal class or an outlier. However, in many real-life applications, data are uncertain in nature due to various errors or partial completeness. These data uncertainty make the detection of outliers far more difficult than it is from clearly separable data. The key challenge of handling uncertain data in outlier detection is how to reduce the impact of uncertain data on the learned distinctive classifier. This paper proposes a new SVDD-based approach to detect outliers on uncertain data. The proposed approach operates in two steps. In the first step, a pseudo-training set is generated by assigning a confidence score to each input example, which indicates the likelihood of an example tending normal class. In the second step, the generated confidence score is incorporated into the support vector data description training phase to construct a global distinctive classifier for outlier detection. In this phase, the contribution of the examples with the least confidence score on the construction of the decision boundary has been reduced. The experiments show that the proposed approach outperforms state-of-art outlier detection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abraham B, Box GEP (1979) Bayesian analysis of some outlier problems in time series. Biometrika 66(2): 229–236
Article MathSciNet MATH Google Scholar
Agarwal C (2005) An empirical bayes approach to detect anomalies in dynamic multidimen-sional arrays. In: Proceedings of the 5th IEEE international conference on data mining. IEEE Computer Society, Washington, DC, USA, pp 26–33
Agarwal D (2006) Detecting anomalies in cross-classified streams: a bayesian approach. Knowl Inf Syst 11(1): 29–44
Article Google Scholar
Aggarwal C (2007) On density based transforms for uncertain data mining. In: Proceedings of IEEE international conference on data mining. IEEE Computer Society, Washington, DC, USA, pp 866–875
Aggarwal C (2009) Managing and mining uncertain data. Springer, Berlin
Book MATH Google Scholar
Aggarwal C, Yu P (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, pp 37–46
Aggarwal C, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of SDM, pp 483–493
Aggarwal C, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5): 609–623
Article Google Scholar
Albrecht S, Busch J, Kloppenburg M, Metze F, Tavan P (2000) Generalized radial basis function networks for classification and novelty detection: self-organization of optional bayesian decision. Neural Netw 13(10): 1075–1093
Article Google Scholar
Barbara D, Couto J, Jajodia S, Wu N (2001a) Detecting novel network intrusions using bayes estimators. In: Proceedings of the first SIAM international conference on data mining
Barbara D, Couto J, Jajodia S, Wu N (2001b) Adam: a testbed for exploring the use of data mining in intrusion detection. SIGMOD Rec 30(4): 15–24
Article Google Scholar
Bi J, Zhang T (2004) Support vector machines with input data uncertainty. In: Proceedings of advances in neural information processing systems (NIPS)
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(6): 1145–1159
Article Google Scholar
Breunig M, Kriegel H, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD), pp 93–104
Cheng R, Kalashnikov D, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of ACM SIGMOD
Chen D, Shao X, Hu B, Su Q (2005) Simultaneous wavelength selection and outlier detection in multivariate regression of near-infrared spectra. Anal Sci 21(2): 161–167
Article MATH Google Scholar
Cheng L, Wing HW (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. In: Proceedings of the national academy of sciences, USA (98), pp 31–36
Dalvi N, Suciu D (2004) Efficient query evaluation on probabilistic databases. VLDB J 16(4): 523–544
Article Google Scholar
Denton A (2009) Subspace sums for extracting non-random data from massive noise. Knowl Inf Syst 20(1): 35–62
Article Google Scholar
Eskin E (2008) Anomaly detection over noisy data using learned probability distributions. In: Proceedings of the seventeenth international conference on machine learning, pp 255–262
Fan HQ, Zaiane OR, Foss A (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1): 31–51
Article Google Scholar
Foss A, Zaiane OR (2011) Class separation through variance: a new application of outlier detection. Knowl Inf Syst 29(3): 565–596
Article Google Scholar
Guo SM, Chen LC, Tsai JSH (2009) A boundary method for outlier detection based on support vector domain description. Pattern Recogn 42(1): 77–83
Article MATH Google Scholar
Hido S, Tsuboi Y, Kashima H, Sugiyama M, Kanamori T (2011) Statistical outlier detection using direct density ratio estimation. Knowl Inf Syst 26(2): 309–336
Article Google Scholar
Hollier G, Austin J (2002) Novelty detection for strain-gauge degradation using maximally correlated components. In: Proceedings of the European symposium on artificial neural networks, pp 257–262
Huang HP, Liu YH (2002) Fuzzy support vector machine. IEEE Trans Neural Netw 13(2): 464–471
Article Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, New Jersey
MATH Google Scholar
Jiang SY, An QB (2008) Clustering-based outlier detection method. In: Proceedings of the fifth IEEE international conference on fuzzy systems and knowledge discovery, 429C433
King S, King DP, Anuzis KA, Tarassenko L, Hayton P, Utete S (2002) The use of novelty detection techniques for monitoring high-integrity plant. In: Proceedings of the 2002 international conference on control applications (1), pp 221–226
Kapil KG, Baikunth N, Ramamohanarao K (2010) Layered approach using conditional random fields for intrusion detection. IEEE Trans Dependable Secur Comput 7(1): 35–49
Article Google Scholar
Kriegel HP, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of 11th ACM SIGKDD international conference knowledge discovery in data mining (KDD)
Lazarevic A, Ertoz L, Ozgur A, Srivastava J, Kumar V (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the third SIAM international conference on data mining (SDM), pp 23–34
Lee KY, Kim DW, Lee KH, Lee D (2007) Density-induced support vector data description. IEEE Trans Neural Netw 18(1): 284–289
Article Google Scholar
Mahoney MV, Chan PK (2003) Learning rules for anomaly detection of hostile net- work trafic. In: Proceedings of the 3rd IEEE international conference on data mining. IEEE Computer Society, pp 601–612
Matsubara Y, Sakurai Y, Yoshikawa M (2011) D-Search: an efficient and exact search algorithm for large distribution sets. Knowl Inf Syst 29(1): 131–157
Article Google Scholar
Murphy PM, Aha DW (2004) UCI repository of machine learning database. http://www.ics.uci.edu/~mlearn/MLRepository.html
Peterson GL, McBride BT (2011) The importance of generalizability for anomaly detection. Knowl Inf Syst 14(3): 377–392
Article Google Scholar
Saitoh S (1998) Theory of reproducing kernels and its applications. Longman Scientific & Technical, Harlow
Google Scholar
Solberg HE, Lahti A (2005) Detection of outliers in reference distributions: Performance of Horn’s algorithm. Clin Chem 51(12): 2326–2332
Article Google Scholar
Shi Y, Zhang L (2011) COID: a cluster Coutlier iterative detection approach to multi-dimensional data analysis. Knowl Inf Syst 28(3): 709–733
Article Google Scholar
Sun H, Bao Y, Zhao F, Yu G, Wang D (2004) CD-trees: an efficient index structure for outlier detection. In: International conference on web-age information management (WAIM), pp 600–609
Tax DMJ, Ypma A, Duin RPW (1999) Support vector data description applied to machine vibration analysis. In: Proceedings of the fifth annual conference of the advanced school for computing and imaging (ASCI), 398C405
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston
Google Scholar
Tax D, Duin R (2004) Support vector data description. Mach Learn 54(1): 45–66
Article MATH Google Scholar
Varun C (2008) Real-time credit card fraud detection. Expert Syst Appl 35(4): 1721–1732
Article Google Scholar
Vapnik VN (1998) The nature of statistical learning theory. Springer, London
Google Scholar
Varun C, Arindam B, Vipin K (2009) Anomaly detection: a survey. ACM Comput Surv 41(3): 1–58
Google Scholar
Van Hulse JD, Khoshgoftaar TM, Huang HY (2007) The pairwise attribute noise detection algorithm. Knowl Inf Syst 11(2): 171–190
Article Google Scholar
Victoria JH, Jim A (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2): 85C126
Google Scholar
Wang DF, Yeung DS, Tsang ECC (2006) Structured one-class classification. IEEE Trans SMC Part B: Cybern 36(6): 1283–1295
Article Google Scholar
Williams G, Baxter R, He H, Hawkins S, Gu L (2002) A comparative study of RNN for outlier detection in data mining. In: Proceedings of the 2002 IEEE international conference on data mining. IEEE Computer Society, Washington, DC, USA, pp 709–718
Xiao YS et al (2009) Multi-sphere support vector data description for outliers detection on multi-distribution data. In: 2009 IEEE international conference on data mining workshops, pp 82–87
Yang WS, Wang SY (2008) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31(1): 56–68
Article Google Scholar
Yang X, Latecki LJ, Pokrajac D (2009) Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of the 2009 SIAM international conference on data mining (SDM), 145C154
Zhang Q, Li F, Yi K (2008) Finding frequent items in probabilistic data. In: Proceedings of ACM SIGMOD

Download references

Author information

Authors and Affiliations

Faculty of Automation, Guangdong University of Technology, Guangdong, People’s Republic of China
Bo Liu
Faculty of Computer, Guangdong University of Technology, Guangdong, People’s Republic of China
Yanshan Xiao & Zhifeng Hao
Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia
Longbing Cao
School of Automation Science and Engineering, South China University of Technology, Guangdong, People’s Republic of China
Feiqi Deng

Authors

Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanshan Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Longbing Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar
Feiqi Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhifeng Hao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, B., Xiao, Y., Cao, L. et al. SVDD-based outlier detection on uncertain data. Knowl Inf Syst 34, 597–618 (2013). https://doi.org/10.1007/s10115-012-0484-y

Download citation

Received: 09 May 2010
Revised: 13 June 2011
Accepted: 05 March 2012
Published: 06 May 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10115-012-0484-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SVDD-based outlier detection on uncertain data

Abstract

Access this article

Similar content being viewed by others

An explainable outlier detection method using region-partition trees

Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

ODRA: an outlier detection algorithm based on relevant attribute analysis method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SVDD-based outlier detection on uncertain data

Abstract

Access this article

Similar content being viewed by others

An explainable outlier detection method using region-partition trees

Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

ODRA: an outlier detection algorithm based on relevant attribute analysis method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation