Abstract
The massive growth of process data in industrial systems has promoted the development of data-driven techniques, while the presence of outliers in process data always deteriorates the effectiveness. This paper focuses on detecting outliers in industrial systems under the assumption that no labeled training data are available. Our method is on the basis of ensemble learning, and the base learners include both one-class classifiers and multi-class classifiers. The core idea is that one-class classifier ensemble model is used to address the problem of missing label, and the usage of multi-class classifier ensemble model is to further improve its performance when outlier examples are available. The essential motivation for this proposal is that results from a classifier trained using only positive data will not be as good as the results using both positive and negative data. We investigate the performance of the proposed scheme with a series of experiments. Ten benchmark data sets and two real-world industrial systems are used, and the results approve the performance of our detection scheme.
Similar content being viewed by others
References
Wang Z et al (2015) Incremental multiple instance outlier detection. Neural Comput Appl 26(4):957–968
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58
Chen PY, Yang S, Mccann JA (2014) Distributed real-time anomaly detection in networked industrial sensing systems. IEEE Trans Ind Electron 62(6):1–1
Liu F, Mao Z, Su W (2012) Outlier detection for process control data based on a non-linear auto-regression hidden Markov model method. Trans Inst Meas Control 34(5):527–538
Schuster F, Paul A, König H (2013) Towards learning normality for anomaly detection in industrial control networks. Springer, Berlin, pp 61–72
Zhao J et al (2014) Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry. Inf Sci 259(3):335–345
Ferdowsi H, Jagannathan S, Zawodniok M (2014) An online outlier identification and removal scheme for improving fault detection performance. IEEE Trans Neural Netw Learn Syst 25(5):908–919
Wang B, Mao Z, Huang K (2017) Detecting outliers in complex nonlinear systems controlled by predictive control strategy. Chaos Solitons Fractals 103:588–595
Wang B, Mao Z (2018) Detecting outliers in electric arc furnace under the condition of unlabeled, imbalanced, non-stationary and noisy data. Meas Control 51(3–4):83–93
Wang B, Mao Z (2017) One-class classifiers ensemble based anomaly detection scheme for process control systems. Trans Inst Meas Control 40(12):3466–3476
Cabral GG, Oliveira ALI, Cahú CBG (2009) Combining nearest neighbor data description and structural risk minimization for one-class classification. Neural Comput Appl 18(2):175–183
Wang J et al (2017) Dynamic hypersphere SVDD without describing boundary for one-class classification. Neural Comput Appl 3:1–11
Cordón O, Jesus MJD, Herrera F (1999) A proposal on reasoning methods in fuzzy rule-based classification systems. Int J Approx Reason 20(1):21–45
Shlien S (1990) Multiple binary decision tree classifiers. Pattern Recognit 23(7):757–763
Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2(3):321–355
Rivas VM et al (2004) Evolving RBF neural networks for time-series forecasting with EvRBF. Inf Sci 165(3):207–220
Vapnik V, Cortes C (1995) Support vector networks. Mach Learn 20(3):273–297
Scholkopf B et al (2000) New support vector algorithms. Neural Comput 12(5):1207–1245
Sesmero MP et al (2012) A new artificial neural network ensemble based on feature selection and class recoding. Neural Comput Appl 21(4):771–783
Tian J, Gu H, Liu W (2011) Imbalanced classification using support vector machine ensemble. Neural Comput Appl 20(2):203–209
Ge S et al (2016) Dynamic Clustering Forest: an ensemble framework to efficiently classify textual data stream with concept drift. Inf Sci 357:125–143
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2(1):139–154
Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Tax DMJ (2001) One-class classification (concept-learning in the absence of counter-examples). Delft University of Technology, Delft
Haijun Z et al (2011) Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Trans Neural Netw 22(10):1532–1546
Gao J, Tan PN (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Sixth international conference on data mining
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
Parhizkar E, Abadi M (2015) BeeOWA: a novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles. Neurocomputing 166:367–381
Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern A Syst Hum 41(3):552–568
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Li L, Mao Z (2012) A direct adaptive controller for EAF electrode regulator system using neural networks. Neurocomputing 82(4):91–98
Chiang LH, Pell RJ, Seasholtz MB (2003) Exploring process data with the use of robust outlier detection algorithms. J Process Control 13(5):437–449
Schölkopf B et al (2014) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Wang X, Yuan P, Mao Z (2015) Ensemble fixed-size LS-SVMs applied for the Mach number prediction in transonic wind tunnel. IEEE Trans Aerosp Electron Syst 51(4):3167–3181
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 51634002 and 61702070) and National Key R & D Program of China (Grant No. 2017YFB0304104).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, B., Mao, Z. Detecting outliers in industrial systems using a hybrid ensemble scheme. Neural Comput & Applic 32, 8047–8063 (2020). https://doi.org/10.1007/s00521-019-04307-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04307-5