Imprecise prior knowledge incorporating into one-class classification

Utkin, Lev V.; Zhuk, Yulia A.

doi:10.1007/s10115-013-0661-7

Imprecise prior knowledge incorporating into one-class classification

Regular Paper
Published: 30 May 2013

Volume 41, pages 53–76, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Lev V. Utkin¹ &
Yulia A. Zhuk²

331 Accesses
8 Citations
Explore all metrics

Abstract

An extension of Campbell and Bennett’s novelty detection or one-class classification model incorporating prior knowledge is studied in the paper. The proposed extension relaxes the strong assumption of the empirical probability distribution over elements of a training set and deals with a set of probability distributions produced by prior knowledge about training data. The classification problem is solved by considering extreme points of the probability distribution set or by means of the conjugate duality technique. Special cases of prior knowledge are considered in detail, including the imprecise linear-vacuous mixture model and interval-valued moments of feature values. Numerical experiments show that the proposed models outperform Campbell and Bennett’s model for many real and synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anomaly and Novelty detection for robust semi-supervised learning

Article 30 June 2020

One-class classifier based on principal curves

Article 16 June 2023

LGND: a new method for multi-class novelty detection

Article 21 November 2017

References

Augustin T (2002) Expected utility within a generalized concept of probability—a comprehensive framework for decision making under ambiguity. Stat Papers 43:5–22
Article MATH MathSciNet Google Scholar
Bartkowiak A (2011) Anomaly, novelty, one-class classification: a comprehensive introduction. Int J Comput Inf Syst Ind Manag Appl 3:61–71
Google Scholar
Ben-Tal A, El Ghaoui L, Nemirovski A (2009) Robust optimization. Princeton University Press, Princeton
MATH Google Scholar
Berger J (1985) Statistical decision theory and Bayesian analysis. Springer, New York
Book MATH Google Scholar
Bicego M, Figueiredo M (2009) Soft clustering using weighted one-class support vector machines. Pattern Recogn 42:27–32
Article MATH Google Scholar
Campbell C (2002) Kernel methods: a survey of current techniques. Neurocomputing 48(1–4):63–84
Article MATH Google Scholar
Campbell C, Bennett K (2001) A linear programming approach to novelty detection. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT Press, Cambridge, pp 395–401
Cantelli F (1910) Intorno ad un teorema fondamentale della teoria del rischio. Boll. Assoc. Attuar. Ital. (Milan) 1–23
Chandola V, Banerjee A, Kumar V (2007) Anomaly detection: a survey. Tech. Rep. TR 07–017. University of Minnesota, Minneapolis MN USA
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:1–58
Article Google Scholar
Chapelle O, Scholkopf B (2001) Incorporating invariances in non-linear support vector machines. In: Dietterich T, Becker S, Ghahraman Z (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 609–616
Google Scholar
Cherkassky V, Mulier F (2007) Learning from data: concepts, theory, and methods. Wiley-IEEE Press, UK
Book Google Scholar
Dayanik A, Lewis D, Madigan D, Menkov V, Genkin A (2006) Constructing informative prior distributions from domain knowledge in text classification. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, NY, USA, pp 493–500
Decoste D, Schölkopf B (2002) Training invariant support vector machines. Mach Learn 46(1–3):161–190
Article MATH Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository (2010). http://archive.ics.uci.edu/ml
Fung G, Mangasarian O, Shavlik J (2002) Knowledge-based support vector machine classifiers. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 521–528
Google Scholar
Gao Y, Gao F (2010) Edited adaboost by weighted knn. Neurocomputing 73(16–18):3079–3088
Article Google Scholar
Gilboa I, Schmeidler D (1989) Maxmin expected utility with non-unique prior. J Math Econ 18(2):141–153
Article MATH MathSciNet Google Scholar
Haasdonk B, Vossen A, Burkhardt H (2005) Invariance in kernel methods by haar-integration kernels. In: Kalviainen H, Parkkinen J, Kaarna A (eds) Image analysis, Lecture Notes in Computer Science, vol 3540. Springer, Berlin Heidelberg, pp 841–851
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Article MATH Google Scholar
Huber P (1981) Robust statistics. Wiley, New York
Book MATH Google Scholar
Joachims T (2002) Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer, Norwell
Book Google Scholar
Kunapuli G, Bennett K, Shabbeer A, Maclin R, Shavlik J (2010) Online knowledge-based support vector machines. In: Machine learning and knowledge discovery in databases, Lecture Notes in Computer Science, vol 6322. Springer, Berlin/Heidelberg, pp 145–161
Kwok J, Tsang IH, Zurada J (2007) A class of single-class minimax probability machines for novelty detection. IEEE Trans Neural Netw 18(3):778–785
Article Google Scholar
Lauer F, Bloch G (2008) Incorporating prior knowledge in support vector machines for classification: a review. Neurocomputing 71(7–9):1578–1594
Article Google Scholar
Lauer F, Bloch G (2008) Incorporating prior knowledge in support vector regression. Mach Learn 70(1):89–118
Article Google Scholar
Lee YJ, Mangasarian O, Wolberg W (2003) Survival-time classification of breast cancer patients. Comput Optim Appl 25(1–3):151–166
Article MATH MathSciNet Google Scholar
Li G, Jeyakumar V, Lee G (2011) Robust conjugate duality for convex optimization under uncertainty with application to data classification. Nonlinear Anal Theory Methods Appl 74(6):2327–2341
Article MATH MathSciNet Google Scholar
Li Y, de Ridder D, Duin R, Reinders M (2008) Integration of prior knowledge of measurement noise in kernel density classification. Pattern Recogn 41:320–330
Article MATH Google Scholar
Lu B, Wang X, Utiyama M (2009) Incorporating prior knowledge into learning by dividing training data. Front Comput Sci China 3(1):109–122
Article Google Scholar
Mangasarian O (2005) Knowledge-based linear programming. SIAM J Optim 15(2):375–382
Article MathSciNet Google Scholar
Markou M, Singh S (2003) Novelty detection: a review—part 1: statistical approaches. Signal Process 83(12):2481–2497
Article MATH Google Scholar
Pavlidis P, Weston J, Cai J, Grundy WN (2001) Gene functional classification from heterogeneous data. In: Proceedings of the fifth annual international conference on Computational biology. ACM, New York, NY, USA, pp 249–255
Robert C (1994) The Bayesian choice. Springer, New York
Book MATH Google Scholar
Scholkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Article Google Scholar
Scholkopf B, Simard P, Smola A, Vapnik V (1998) Prior knowledge in support vector kernels. In: Advances in neural information processing systems. Proceedings of the 1997 conference, vol 10. MIT Press, Cambridge, pp 640–646
Scholkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge
Google Scholar
Scholkopf B, Williamson R, Smola A, Shawe-Taylor J, Platt J (2000) Support vector method for novelty detection. In: Advances in neural information processing systems, pp 526–532
Small K, Wallace B, Brodley C, Trikalinos T (2011) The constrained weight space svm: learning with ranked features. In: Proc. of the 28th International Conference on Machine Learning (ICML). Omnipress, Bellevue, WA, USA, pp 865–872
Smola A, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Article MathSciNet Google Scholar
Steinwart I, Hush D, Scovel C (2005) A classification framework for anomaly detection. J Mach Learn Res 6:211–232
MATH MathSciNet Google Scholar
Sun Q, Wang LL, Lim S, DeJong G (2007) Robustness through prior knowledge: using explanation-based learning to distinguish handwritten Chinese characters. Int J Document Anal Recogn 10(3–4), 175–186. doi:10.1007/s10032-007-0053-1
Sun Z, Zhang ZK, Wang HG (2008) Incorporating prior knowledge into kernel based regression. Acta Automatica Sinica 34(12):1515–1521
Article MathSciNet Google Scholar
Tai F, Pan W (2007) Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms. Bioinformatics 23(14):1775–1782
Article Google Scholar
Tax D, Duin R (1999) Support vector domain description. Pattern Recogn Lett 20:1191–1199
Article Google Scholar
Tax D, Duin R (2004) Support vector data description. Mach Learn 54:45–66
Article MATH Google Scholar
Troffaes M (2007) Decision making under uncertainty using imprecise probabilities. Int J Approx Reason 45(1):17–29
Article MATH MathSciNet Google Scholar
Utkin L (2002) Imprecise calculation with the qualitative information about probability distributions. In: Grzegorzewski P, Hryniewicz O, Gil M (eds) Soft methods in probability, statistics and data analysis. Phisica-Verlag, Heidelberg, pp 164–169
Chapter Google Scholar
Utkin L (2003) Imprecise second-order hierarchical uncertainty model. Int J Uncertain Fuzziness Knowl Based Syst 11(3):301–317
Article MATH MathSciNet Google Scholar
Utkin L (2007) Second-order uncertainty calculations by using the imprecise Dirichlet model. Intell Data Anal 11(3):225–244
Google Scholar
Utkin L, Augustin T (2007) Decision making under incomplete data using the imprecise Dirichlet model. Int J Approx Reason 44(3):322–338
Article MATH MathSciNet Google Scholar
Vapnik V (1998) Stat Learn Theory. Wiley, New York
Google Scholar
Veillard A, Racoceanu D, Bressan S (2011) Incorporating prior-knowledge in support vector machines by kernel adaptation. In: Proceedings of the IEEE 23rd international conference on tools with artificial intelligence. IEEE Computer Society, Washington, DC, USA, pp 591–596
Walley P (1991) Statistical reasoning with imprecise probabilities. Chapman and Hall, London
Book MATH Google Scholar
Wang J, Lu H, Plataniotis K, Lu J (2009) Gaussian kernel optimization for pattern classification. Pattern Recogn 42(7):1237–1247
Article MATH Google Scholar
Wang L, Xue P, Chan KL (2004) Incorporating prior knowledge into SVM for image retrieval. In: Proceedings of the 17th international conference on pattern recognition (ICPR’04), vol 2. IEEE Computer Society, Los Alamitos, CA, USA, pp 981–984
Wu X, Kumar V, Ross Q, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou ZH, Steinbach M, Hand D, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Article Google Scholar
Wu X, Srihari R (2004) Incorporating prior knowledge with weighted margin support vector machines. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, NY, USA, pp 326–333
Xing Z, Pei J, Yu P, Wang K (2011) Extracting interpretable features for early classification on time series. In: Proceedings of the eleventh SIAM international conference on data mining. Omnipress, pp 247–258
Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recog Artif Intell 21(5):961–976
Article Google Scholar
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost proportionate example weighting. In: Proceedings of the third IEEE international conference on data mining. Melbourne, FL, pp 435–442
Zhao Z, Zhong P, Zhao Y (2011) Learning svm with weighted maximum margin criterion for classification of imbalanced data. Math Comput Model 54(3–4):1093–1099
Article MATH Google Scholar
Xu H, Caramanis C, Mannor S (2009) Robustness and regularization of support vector machines. J Mach Learn Res 10:1485–1510
MATH MathSciNet Google Scholar

Download references

Acknowledgments

We would like to express our appreciation to the anonymous referees and the editor whose very valuable comments have improved the paper.

Author information

Authors and Affiliations

Department of Control, Automation and System Analysis, St. Petersburg State Forest Technical University, Institutski per. 5, 194021 , Saint Petersburg, Russia
Lev V. Utkin
Department of Information Systems and Technology, St. Petersburg State Forest Technical University, Institutski per. 5, 194021 , Saint Petersburg, Russia
Yulia A. Zhuk

Authors

Lev V. Utkin
View author publications
You can also search for this author in PubMed Google Scholar
Yulia A. Zhuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lev V. Utkin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Utkin, L.V., Zhuk, Y.A. Imprecise prior knowledge incorporating into one-class classification. Knowl Inf Syst 41, 53–76 (2014). https://doi.org/10.1007/s10115-013-0661-7

Download citation

Received: 01 November 2012
Revised: 26 February 2013
Accepted: 29 March 2013
Published: 30 May 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10115-013-0661-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imprecise prior knowledge incorporating into one-class classification

Abstract

Access this article

Similar content being viewed by others

Anomaly and Novelty detection for robust semi-supervised learning

One-class classifier based on principal curves

LGND: a new method for multi-class novelty detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Imprecise prior knowledge incorporating into one-class classification

Abstract

Access this article

Similar content being viewed by others

Anomaly and Novelty detection for robust semi-supervised learning

One-class classifier based on principal curves

LGND: a new method for multi-class novelty detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation