Abstract
Outliers accompany control engineers in their real life activity. Industrial reality is much richer than elementary linear, quadratic, Gaussian assumptions. Outliers appear due to various and varying, often unknown, reasons. They meet research interest in statistical and regression analysis and in data mining. There are a lot of interesting algorithms and approaches to outlier detection, labelling, filtering and finally interpretation. Unfortunately, their impact on control systems has not been found sufficient attention in research. Their influence is frequently unnoticed, ignored or not mentioned. This work focuses on the subject of outlier detection and labelling in the context of control system performance analysis. Selected statistical data-driven approaches are analyzed, as they can be easily implemented with limited a priori knowledge. The study consists of a simulation study followed by the analysis of real control data. Different generation mechanisms are simulated, like overlapping Gaussian processes, symmetric and asymmetric, artificially shifted points and fat-tailed distributions. Simulation observations are confronted with industrial control loops datasets. The work concludes with a practical procedure, which should help practitioners in dealing with outliers in control engineering temporal data.
Similar content being viewed by others
References
W. J. Dixon. Analysis of extreme values. The Annals of Mathematical Statistics, vol. 21, no. 4, pp. 488–506, 1950. DOI: https://doi.org/10.1214/aoms/1177729747.
H. Wainer. Robust statistics: A survey and some prescriptions. Journal of Educational Statistics, vol. 1, no. 4, pp. 285–312, 1976. DOI: https://doi.org/10.3102/10769986001004285.
D. M. Hawkins. Identification of Outliers, Dordrecht, The Netherlands: Springer, 1980. DOI: https://doi.org/10.1007/978-944-015-3994-4.
R. A. Johnson, D. W. Wirchern. Applied Multivariate Statistical Analysis, 3rd ed., Englewood Cliffs, USA: Prentice-Hall, 1992.
V. Barnett, T. Lewis. Outliers in Statistical Data, 3rd ed., Chichester, UK: Wiley, 1994.
J. R. Xue, J. W. Fang, P. Zhang. A survey of scene understanding by event reasoning in autonomous driving. International Journal of Automation and Computing, vol. 15, no. 3, pp. 249–266, 2018. DOI: https://doi.org/10.1007/s11633-018-1126-y.
J. W. Osborne, A. Overbay. The power of outliers (and why researchers should always check for them). Practical Assessment, Research, and Evaluation, vol. 9, no. 9, Article number 6, 2004. DOI: https://doi.org/10.7275/qf69-7k43.
N. N. Taleb. Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications. USA: STEM Academic Press, 2020.
P. J. Rousseeuw, A. M. Leroy. Robust Regression and Outlier Detection, New York, USA John Wiley & Sons, 1987.
I. Ben-Gal. Outlier detection. Data Mining and Knowledge Discovery Handbook, O. Maimon, L. Rokach, Eds., Boston, USA: Springer, pp. 131–146, 2005. DOI: https://doi.org/10.1007/0-387-25465-X_7.
B. Iglewicz, D. C. Hoaglin. How to Detect and Handle Outliers, Milwaukee, USA: ASQ Quality Press, 1993.
J. W. Kantelhardt, S. A. Zschiegner, E. Koscielny-Bunde, S. Havlin, A. Bunde, H. E. Stanley. Multifractal detrended fluctuation analysis of nonstationary time series. Physica A: Statistical Mechanics and its Applications, vol. 316, no. 1–4, pp. 87–114, 2002. DOI: https://doi.org/10.1016/S0378-4371(02)01383-3.
J. Barunik, T. Aste, T. Di Matteo, R. P. Liu. Understanding the source of multifractality in financial markets. Physica A: Statistical Mechanics and its Applications, vol. 391, no. 17, pp. 4234–4251, 2012. DOI: https://doi.org/10.1016/j.physa.2012.03.037.
B. Mandelbrot, R. L. Hudson. The Misbehavior of Markets: A Fractal View of Financial Turbulence, New York, USA: Basic Books, 2005.
H. P. Kriegel, P. Kröger, A. Zimek. Outlier detection techniques. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, USA, 2010.
K. G. Mehrotra, C. K. Mohan, H. M. Huang. Anomaly Detection Principles and Algorithms, Cham, Germany: Springer, 2017. DOI: https://doi.org/10.1007/978-3-319-67526-8.
B. Peirce. Criterion for the rejection of doubtful observations. Astronomical Journal, vol. 2, no. 45, pp. 161–163, 1852. DOI: https://doi.org/10.1086/100259.
J. Irwin. On a criterion for the rejection of outlying observations. Biometrika, vol. 17, no. 3–4, pp. 238–250, 1925. DOI: https://doi.org/10.1093/biomet/17.3-4.238.
E. S. Pearson, C. C. Sekar. The efficiency of statistical tools and a criterion for the rejection of outlying observations. Biometrika, vol. 28, no. 3–4, pp. 308–320, 1936. DOI: https://doi.org/10.1093/biomet/28.3-4.308.
F. E. Grubbs. Sample criteria for testing outlying observations. The Annals of Mathematical Statistics, vol. 21, no. 1, pp. 27–58, 1950. DOI: https://doi.org/10.1214/aoms/1177729885.
N. A. Heckert, J. J. Filliben, C. M. Croarkin, B. Hembree, W. F. Guthrie, P. Tobias, J. Prinz. NIST/SEMATECH e-Handbook of Statistical Methods, 2012, [Online], Avalable: http://www.itl.nist.gov/div898/handbook/, February 08, 2020.
F. Rosado. Outliers: The strength of minors. New Advances in Statistical Modeling and Applications, A. Pacheco, R. Santos, M. D. R. Oliveira, C. D. Paulino, Eds., Cham, Germany: Springer, 2014.
D. L. Whaley III. The Interquartile Range: Theory and Estimation, Master dissertation, Faculty of the Department of Mathematics, East Tennessee State University, USA, 2005.
G. L. Tietjen, R. H. Moore. Some grubbs-type statistics for the detection of several outliers. Technometrics, vol. 14, no. 3, pp. 583–597, 1972. DOI: https://doi.org/10.1080/00401706.1972.10488948.
M. Hubert, M. Debruyne. Minimum covariance determinant. WIREs Computational Statistics, vol. 2, no. 1, pp. 36–43, 2010. DOI: https://doi.org/10.1002/wics.61.
B. Rosner. Percentage points for a generalized ESD many-outlier procedure. Technometrics, vol. 25, no. 2, pp. 165–172, 1983. DOI: https://doi.org/10.1080/00401706.1983.10487848.
R. Thompson. A note on restricted maximum likelihood estimation with an alternative outlier model. Journa of the Royal Statistical Society: Series B (Methodological), vol. 47, no. 1, pp. 53–55, 1985. DOI: https://doi.org/10.1111/j.2517-6161.1985.tb01329.x.
P. J. Huber Hoboker, E. M. Ronchetti. Robust Statistics, 2nd ed., Hoboken, USA: Wiley, 2009. DOI: https://doi.org/10.1002/9780470434697.
R. K. Pearson. Mining Imperfect Data: Dealing with Contamination and Incomplete Records, Philadelphia, USA: SIAM, 2005.
N. N. Taleb. Real-world Statistical Consequences of Fat Tails: Papers and Commentary, UK: STEM Academic Press, 2018.
P. D. Domański Statistical measures. Control Performance Assessment: Theoretical Analyses and Industrial Practice, P. D. Domański, Ed., Cham, Germany: Springer, pp. 53–74, 2020. DOI: https://doi.org/10.1007/978-3-030-23593-2_4.
P. D. Domański. Non-Gaussian properties of the real industrial control error in SISO loops. In Proceedings of the 19th International Conference on System Theory, Control and Computing, IEEE, Cheile Gradistei, Romnnia, pp. 877–882, 2015. DOI: https://doi.org/10.1109/ICSTCC.2015.7321405
K. Malik, H. Sadawarti, G. S. Kalra. Comparative analysis of outlier detection techniques. International Journal of Computer Applications, vol. 97, no. 8, pp. 12–21, 2014. DOI: https://doi.org/10.5120/17026-7318.
S. A. Shaikh, H. Kitagawa. Top-k outlier detection from uncertain data. International Journal of Automation and Computing, vol. 11, no. 2, pp. 128–142, 2014. DOI: https://doi.org/10.1007/s11633-014-0775-8.
Z. G. Ding, D. J. Du, M. R. Fei An isolation principle based distributed anomaly detection method in wireless sensor networks. International Journal of Automation and Computing, vol. 12, no. 4, pp. 402–412, 2015. DOI: https://doi.org/10.1007/s11633-014-0847-9.
S. Banerjee, T. Chattopadhyay, U. Garain. A wide learning approach for interpretable feature recommendation for 1-d sensor data in iot analytics. International Journal of Automation and Computing, vol. 16, no. 6, pp. 800–811, 2019. DOI: https://doi.org/10.1007/s11633-019-1185-8.
N. N. R. Ranga Suri, N. Murty M. G. Athithan. Outlier Detection: Techniques and Applications: A Data Mining Perspective, Cham, Germany: Springer, 2019. DOI: https://doi.org/10.1007/978-3-030-05127-3.
A. Zimek, P. Filzmoser. There and back again: Outlier detection between statistical reasoning and data mining algorithms. WIRES Data Mining and Knowledge Discovery, vol. 8, no. 6, Article number e1280, 2018. DOI: https://doi.org/10.1002/widm.1280.
P. J. Rousseeuw, M. Hubert. Anomaly detecrion by robust statistics. WIREs Data Mining and Knowledge Discovery, vol 8, no. 2, Article number e2236, 2088. DOI: https://doi.org/10.1002/widm.1236.
M. Templ, J. Gussenbauer, P. Filzmoser. Evaluation of robust outlier detection methods for zero-inflated complex data. Journal of Applied Statistics, vol. 47, no. 7, pp. 1144–1167, 2020. DOI: https://doi.org/10.1080/02664763.2019.1671961.
M. P. J. Van Der Loo. Distribution based Outlier Detection in Univariate Data. Technical Report Discussion Paper 00003, Statistics Netherlands, The Hague/Heerlen, Netherlands, 2010.
G. Barbato, E. M. Barini, G. Genta, R. Levi. Features and performance of some outlier detection methods. Journal of Applied Statistics, vol. 38, no. 10, pp. 2133–2149, 2011. DOI: https://doi.org/10.1080/02664763.2010.545119.
M. Gupta, J. Gao, C. Aggarwal, J. W. Han. Outlier Detection for Temporal Data, San Rafael, USA: Morgan & Clay-pool Publishers, 2014. DOI: https://doi.org/10.2200/S00573ED1V01Y201403DMK008.
P. D. Domański. Statistical measures for proportional–integral–derivative control quality: Simulations and industrial data. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, vol. 232, no. 4, pp. 428–441, 2018. DOI: https://doi.org/10.1177/0959651817754034.
P. D. Domański, S. Golonka, P. M. Marusak, B. Moszowski. Robust and asymmetric assessment of the benefits from impoved control — industrial validation. IFAC-PapersOnLine, vol. 51, no. 18, pp. 815–820, 2018. DOI: https://doi.org/10.1016/j.ifacol.2018.09.260.
L. B. Klebanov. Big outliers versus heavy tails: What to use? https://arxiv.org/abs/1611.05410.
C. Croux, C. Dehon. Robust estimation of location and scale. Encyclopedia of Environmetrics, A. H. El-Shaarawi, W. W. Piegorsch, Eds., Hoboken, USA: Wiley, 2013. DOI: https://doi.org/10.1002/9780470057339.vnn093.
S. Verboven, M. Hubert. LIBRA: A MATLAB library for robust analysis. Chemometrics and Intelligent Laboratory Systems, vol. 75, no. 2, pp. 127–136, 2005. DOI: https://doi.org/10.1016/j.chemolab.2004.06.003.
J. H. McCulloch. Simple consistent estimators of stable distribution parameters. Communications in Statistics — Simulation and Computation, vol. 15, no. 4, pp. 1109–1136, 1986. DOI: https://doi.org/10.1080/03610918608812563.
I. A. Koutrouvelis. Regression-type estimation of the parameters of stable laws. Journal of the American Statistical Association, vol. 75, no. 372, pp. 918–928, 1980. DOI: https://doi.org/10.1080/01621459.1980.10477573.
E. E. Kuruoglu. Density parameter estimation of skewed α-stable distributions. IEEE Transactions on Signal Processing, vol. 49, no. 10, pp. 2192–2201, 2001. DOI: https://doi.org/10.1109/78.950775.
S. Borak, A. Misiorek, R. Weron. Models for heavy-tailed asset returns. Statistical Tools for Finance and Insurance, 2nd ed., P. Cizek, W. K. Härdle, R. Weron, Eds., Berlin, Heidelberg, Germany: Springer, pp. 21–55, 2011. DOI: https://doi.org/10.1007/978-3-642-18062-0_1.
A. Alfons, M. Templ, P. Filzmoser. Robust estimation of economic indicators from survey samples based on pareto tail modelling. Journal of the Royal Statistical Society: Series C, vol. 62, no. 2, pp. 271–286, 2013. DOI: https://doi.org/10.1111/j.1467-9876.2012.01063.x.
J. Danielsson, L. M. Ergun, L. De Haan, C. G. De Vries. Tail Index Estimation: Quantile Driven Threshold Selection, Bank of Canada Staff Working Paper 2019–28, Bank of Canada.
P. D. Domański. Control Performance Assessment: Theoretical Analyses and Industrial Practice, Cham, Germany: Springer, 2020. DOI: https://doi.org/10.1007/978-3-030-23593-2.
M. C. Bryson. Heavy-tailed distributions: Properties and tests. Technometrics, vol. 16, no. 1, pp. 61–68, 1974. DOI: https://doi.org/10.1080/00401706.1974.10489150.
L. B. Klebanov, I. Volchenkova. Outliers and the ostensibly heavy tails. https://arxiv.org.abs/1807.08715vl.
G. Marsaglia, W. W. Tsang. A simple method for generating gamma variables. ACM Transactions on Mathematical Software, vol. 26, no. 3, pp. 363–372, 2000. DOI: https://doi.org/10.1145/358407.358414.
N. L. Johnson, S. Kotz, N. Balakrishnan. Continuous Univariate Distributions, 2nd ed., New York, USA: Wiley, 1995.
Author information
Authors and Affiliations
Corresponding author
Additional information
Pawel D. Domański received the M.Sc. degree, Ph.D. degree and D.Sc. degree in control engineering from Faculty of Electronics and Information Technology, Warsaw University of Technology, Poland in 1967, 1991 and 1996, respectively. He works in the Institute of Control and Computational Engineering, Warsaw University of Technology, Poland from 1991. He is the author of one book and more than 100 publications. Apart from scientific research, he participated in dozens of industrial implementations of advanced process control and optimization in power and chemical industries all over the world.
His research interests include industrial advanced process control applications, control performance quality assessment and optimization.
Rights and permissions
About this article
Cite this article
Domański, P.D. Study on Statistical Outlier Detection and Labelling. Int. J. Autom. Comput. 17, 788–811 (2020). https://doi.org/10.1007/s11633-020-1243-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-020-1243-2