Study on Statistical Outlier Detection and Labelling

Domański, Paweł D.

doi:10.1007/s11633-020-1243-2

Study on Statistical Outlier Detection and Labelling

Research Article
Published: 21 October 2020

Volume 17, pages 788–811, (2020)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

Paweł D. Domański ORCID: orcid.org/0000-0003-4053-3330¹

371 Accesses
20 Citations
7 Altmetric
Explore all metrics

Abstract

Outliers accompany control engineers in their real life activity. Industrial reality is much richer than elementary linear, quadratic, Gaussian assumptions. Outliers appear due to various and varying, often unknown, reasons. They meet research interest in statistical and regression analysis and in data mining. There are a lot of interesting algorithms and approaches to outlier detection, labelling, filtering and finally interpretation. Unfortunately, their impact on control systems has not been found sufficient attention in research. Their influence is frequently unnoticed, ignored or not mentioned. This work focuses on the subject of outlier detection and labelling in the context of control system performance analysis. Selected statistical data-driven approaches are analyzed, as they can be easily implemented with limited a priori knowledge. The study consists of a simulation study followed by the analysis of real control data. Different generation mechanisms are simulated, like overlapping Gaussian processes, symmetric and asymmetric, artificially shifted points and fat-tailed distributions. Simulation observations are confronted with industrial control loops datasets. The work concludes with a practical procedure, which should help practitioners in dealing with outliers in control engineering temporal data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review on model predictive control: an engineering perspective

Article Open access 11 August 2021

Max Schwenzer, Muzaffer Ay, … Dirk Abel

A review on fault detection and diagnosis techniques: basics and beyond

Article 10 November 2020

Anam Abid, Muhammad Tahir Khan & Javaid Iqbal

A survey of methods for time series change point detection

Article 08 September 2016

Samaneh Aminikhanghahi & Diane J. Cook

References

W. J. Dixon. Analysis of extreme values. The Annals of Mathematical Statistics, vol. 21, no. 4, pp. 488–506, 1950. DOI: https://doi.org/10.1214/aoms/1177729747.
MathSciNet MATH Google Scholar
H. Wainer. Robust statistics: A survey and some prescriptions. Journal of Educational Statistics, vol. 1, no. 4, pp. 285–312, 1976. DOI: https://doi.org/10.3102/10769986001004285.
Google Scholar
D. M. Hawkins. Identification of Outliers, Dordrecht, The Netherlands: Springer, 1980. DOI: https://doi.org/10.1007/978-944-015-3994-4.
MATH Google Scholar
R. A. Johnson, D. W. Wirchern. Applied Multivariate Statistical Analysis, 3rd ed., Englewood Cliffs, USA: Prentice-Hall, 1992.
Google Scholar
V. Barnett, T. Lewis. Outliers in Statistical Data, 3rd ed., Chichester, UK: Wiley, 1994.
MATH Google Scholar
J. R. Xue, J. W. Fang, P. Zhang. A survey of scene understanding by event reasoning in autonomous driving. International Journal of Automation and Computing, vol. 15, no. 3, pp. 249–266, 2018. DOI: https://doi.org/10.1007/s11633-018-1126-y.
Google Scholar
J. W. Osborne, A. Overbay. The power of outliers (and why researchers should always check for them). Practical Assessment, Research, and Evaluation, vol. 9, no. 9, Article number 6, 2004. DOI: https://doi.org/10.7275/qf69-7k43.
N. N. Taleb. Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications. USA: STEM Academic Press, 2020.
Google Scholar
P. J. Rousseeuw, A. M. Leroy. Robust Regression and Outlier Detection, New York, USA John Wiley & Sons, 1987.
MATH Google Scholar
I. Ben-Gal. Outlier detection. Data Mining and Knowledge Discovery Handbook, O. Maimon, L. Rokach, Eds., Boston, USA: Springer, pp. 131–146, 2005. DOI: https://doi.org/10.1007/0-387-25465-X_7.
Google Scholar
B. Iglewicz, D. C. Hoaglin. How to Detect and Handle Outliers, Milwaukee, USA: ASQ Quality Press, 1993.
Google Scholar
J. W. Kantelhardt, S. A. Zschiegner, E. Koscielny-Bunde, S. Havlin, A. Bunde, H. E. Stanley. Multifractal detrended fluctuation analysis of nonstationary time series. Physica A: Statistical Mechanics and its Applications, vol. 316, no. 1–4, pp. 87–114, 2002. DOI: https://doi.org/10.1016/S0378-4371(02)01383-3.
MATH Google Scholar
J. Barunik, T. Aste, T. Di Matteo, R. P. Liu. Understanding the source of multifractality in financial markets. Physica A: Statistical Mechanics and its Applications, vol. 391, no. 17, pp. 4234–4251, 2012. DOI: https://doi.org/10.1016/j.physa.2012.03.037.
Google Scholar
B. Mandelbrot, R. L. Hudson. The Misbehavior of Markets: A Fractal View of Financial Turbulence, New York, USA: Basic Books, 2005.
Google Scholar
H. P. Kriegel, P. Kröger, A. Zimek. Outlier detection techniques. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, USA, 2010.
K. G. Mehrotra, C. K. Mohan, H. M. Huang. Anomaly Detection Principles and Algorithms, Cham, Germany: Springer, 2017. DOI: https://doi.org/10.1007/978-3-319-67526-8.
Google Scholar
B. Peirce. Criterion for the rejection of doubtful observations. Astronomical Journal, vol. 2, no. 45, pp. 161–163, 1852. DOI: https://doi.org/10.1086/100259.
Google Scholar
J. Irwin. On a criterion for the rejection of outlying observations. Biometrika, vol. 17, no. 3–4, pp. 238–250, 1925. DOI: https://doi.org/10.1093/biomet/17.3-4.238.
MATH Google Scholar
E. S. Pearson, C. C. Sekar. The efficiency of statistical tools and a criterion for the rejection of outlying observations. Biometrika, vol. 28, no. 3–4, pp. 308–320, 1936. DOI: https://doi.org/10.1093/biomet/28.3-4.308.
MATH Google Scholar
F. E. Grubbs. Sample criteria for testing outlying observations. The Annals of Mathematical Statistics, vol. 21, no. 1, pp. 27–58, 1950. DOI: https://doi.org/10.1214/aoms/1177729885.
MathSciNet MATH Google Scholar
N. A. Heckert, J. J. Filliben, C. M. Croarkin, B. Hembree, W. F. Guthrie, P. Tobias, J. Prinz. NIST/SEMATECH e-Handbook of Statistical Methods, 2012, [Online], Avalable: http://www.itl.nist.gov/div898/handbook/, February 08, 2020.
F. Rosado. Outliers: The strength of minors. New Advances in Statistical Modeling and Applications, A. Pacheco, R. Santos, M. D. R. Oliveira, C. D. Paulino, Eds., Cham, Germany: Springer, 2014.
Google Scholar
D. L. Whaley III. The Interquartile Range: Theory and Estimation, Master dissertation, Faculty of the Department of Mathematics, East Tennessee State University, USA, 2005.
Google Scholar
G. L. Tietjen, R. H. Moore. Some grubbs-type statistics for the detection of several outliers. Technometrics, vol. 14, no. 3, pp. 583–597, 1972. DOI: https://doi.org/10.1080/00401706.1972.10488948.
Google Scholar
M. Hubert, M. Debruyne. Minimum covariance determinant. WIREs Computational Statistics, vol. 2, no. 1, pp. 36–43, 2010. DOI: https://doi.org/10.1002/wics.61.
Google Scholar
B. Rosner. Percentage points for a generalized ESD many-outlier procedure. Technometrics, vol. 25, no. 2, pp. 165–172, 1983. DOI: https://doi.org/10.1080/00401706.1983.10487848.
MATH Google Scholar
R. Thompson. A note on restricted maximum likelihood estimation with an alternative outlier model. Journa of the Royal Statistical Society: Series B (Methodological), vol. 47, no. 1, pp. 53–55, 1985. DOI: https://doi.org/10.1111/j.2517-6161.1985.tb01329.x.
Google Scholar
P. J. Huber Hoboker, E. M. Ronchetti. Robust Statistics, 2nd ed., Hoboken, USA: Wiley, 2009. DOI: https://doi.org/10.1002/9780470434697.
MATH Google Scholar
R. K. Pearson. Mining Imperfect Data: Dealing with Contamination and Incomplete Records, Philadelphia, USA: SIAM, 2005.
MATH Google Scholar
N. N. Taleb. Real-world Statistical Consequences of Fat Tails: Papers and Commentary, UK: STEM Academic Press, 2018.
Google Scholar
P. D. Domański Statistical measures. Control Performance Assessment: Theoretical Analyses and Industrial Practice, P. D. Domański, Ed., Cham, Germany: Springer, pp. 53–74, 2020. DOI: https://doi.org/10.1007/978-3-030-23593-2_4.
MATH Google Scholar
P. D. Domański. Non-Gaussian properties of the real industrial control error in SISO loops. In Proceedings of the 19th International Conference on System Theory, Control and Computing, IEEE, Cheile Gradistei, Romnnia, pp. 877–882, 2015. DOI: https://doi.org/10.1109/ICSTCC.2015.7321405
Google Scholar
K. Malik, H. Sadawarti, G. S. Kalra. Comparative analysis of outlier detection techniques. International Journal of Computer Applications, vol. 97, no. 8, pp. 12–21, 2014. DOI: https://doi.org/10.5120/17026-7318.
Google Scholar
S. A. Shaikh, H. Kitagawa. Top-k outlier detection from uncertain data. International Journal of Automation and Computing, vol. 11, no. 2, pp. 128–142, 2014. DOI: https://doi.org/10.1007/s11633-014-0775-8.
Google Scholar
Z. G. Ding, D. J. Du, M. R. Fei An isolation principle based distributed anomaly detection method in wireless sensor networks. International Journal of Automation and Computing, vol. 12, no. 4, pp. 402–412, 2015. DOI: https://doi.org/10.1007/s11633-014-0847-9.
Google Scholar
S. Banerjee, T. Chattopadhyay, U. Garain. A wide learning approach for interpretable feature recommendation for 1-d sensor data in iot analytics. International Journal of Automation and Computing, vol. 16, no. 6, pp. 800–811, 2019. DOI: https://doi.org/10.1007/s11633-019-1185-8.
Google Scholar
N. N. R. Ranga Suri, N. Murty M. G. Athithan. Outlier Detection: Techniques and Applications: A Data Mining Perspective, Cham, Germany: Springer, 2019. DOI: https://doi.org/10.1007/978-3-030-05127-3.
Google Scholar
A. Zimek, P. Filzmoser. There and back again: Outlier detection between statistical reasoning and data mining algorithms. WIRES Data Mining and Knowledge Discovery, vol. 8, no. 6, Article number e1280, 2018. DOI: https://doi.org/10.1002/widm.1280.
P. J. Rousseeuw, M. Hubert. Anomaly detecrion by robust statistics. WIREs Data Mining and Knowledge Discovery, vol 8, no. 2, Article number e2236, 2088. DOI: https://doi.org/10.1002/widm.1236.
M. Templ, J. Gussenbauer, P. Filzmoser. Evaluation of robust outlier detection methods for zero-inflated complex data. Journal of Applied Statistics, vol. 47, no. 7, pp. 1144–1167, 2020. DOI: https://doi.org/10.1080/02664763.2019.1671961.
MathSciNet Google Scholar
M. P. J. Van Der Loo. Distribution based Outlier Detection in Univariate Data. Technical Report Discussion Paper 00003, Statistics Netherlands, The Hague/Heerlen, Netherlands, 2010.
G. Barbato, E. M. Barini, G. Genta, R. Levi. Features and performance of some outlier detection methods. Journal of Applied Statistics, vol. 38, no. 10, pp. 2133–2149, 2011. DOI: https://doi.org/10.1080/02664763.2010.545119.
MathSciNet MATH Google Scholar
M. Gupta, J. Gao, C. Aggarwal, J. W. Han. Outlier Detection for Temporal Data, San Rafael, USA: Morgan & Clay-pool Publishers, 2014. DOI: https://doi.org/10.2200/S00573ED1V01Y201403DMK008.
P. D. Domański. Statistical measures for proportional–integral–derivative control quality: Simulations and industrial data. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, vol. 232, no. 4, pp. 428–441, 2018. DOI: https://doi.org/10.1177/0959651817754034.
Google Scholar
P. D. Domański, S. Golonka, P. M. Marusak, B. Moszowski. Robust and asymmetric assessment of the benefits from impoved control — industrial validation. IFAC-PapersOnLine, vol. 51, no. 18, pp. 815–820, 2018. DOI: https://doi.org/10.1016/j.ifacol.2018.09.260.
Google Scholar
L. B. Klebanov. Big outliers versus heavy tails: What to use? https://arxiv.org/abs/1611.05410.
C. Croux, C. Dehon. Robust estimation of location and scale. Encyclopedia of Environmetrics, A. H. El-Shaarawi, W. W. Piegorsch, Eds., Hoboken, USA: Wiley, 2013. DOI: https://doi.org/10.1002/9780470057339.vnn093.
Google Scholar
S. Verboven, M. Hubert. LIBRA: A MATLAB library for robust analysis. Chemometrics and Intelligent Laboratory Systems, vol. 75, no. 2, pp. 127–136, 2005. DOI: https://doi.org/10.1016/j.chemolab.2004.06.003.
Google Scholar
J. H. McCulloch. Simple consistent estimators of stable distribution parameters. Communications in Statistics — Simulation and Computation, vol. 15, no. 4, pp. 1109–1136, 1986. DOI: https://doi.org/10.1080/03610918608812563.
MathSciNet MATH Google Scholar
I. A. Koutrouvelis. Regression-type estimation of the parameters of stable laws. Journal of the American Statistical Association, vol. 75, no. 372, pp. 918–928, 1980. DOI: https://doi.org/10.1080/01621459.1980.10477573.
MathSciNet MATH Google Scholar
E. E. Kuruoglu. Density parameter estimation of skewed α-stable distributions. IEEE Transactions on Signal Processing, vol. 49, no. 10, pp. 2192–2201, 2001. DOI: https://doi.org/10.1109/78.950775.
MathSciNet MATH Google Scholar
S. Borak, A. Misiorek, R. Weron. Models for heavy-tailed asset returns. Statistical Tools for Finance and Insurance, 2nd ed., P. Cizek, W. K. Härdle, R. Weron, Eds., Berlin, Heidelberg, Germany: Springer, pp. 21–55, 2011. DOI: https://doi.org/10.1007/978-3-642-18062-0_1.
Google Scholar
A. Alfons, M. Templ, P. Filzmoser. Robust estimation of economic indicators from survey samples based on pareto tail modelling. Journal of the Royal Statistical Society: Series C, vol. 62, no. 2, pp. 271–286, 2013. DOI: https://doi.org/10.1111/j.1467-9876.2012.01063.x.
MathSciNet Google Scholar
J. Danielsson, L. M. Ergun, L. De Haan, C. G. De Vries. Tail Index Estimation: Quantile Driven Threshold Selection, Bank of Canada Staff Working Paper 2019–28, Bank of Canada.
P. D. Domański. Control Performance Assessment: Theoretical Analyses and Industrial Practice, Cham, Germany: Springer, 2020. DOI: https://doi.org/10.1007/978-3-030-23593-2.
MATH Google Scholar
M. C. Bryson. Heavy-tailed distributions: Properties and tests. Technometrics, vol. 16, no. 1, pp. 61–68, 1974. DOI: https://doi.org/10.1080/00401706.1974.10489150.
MathSciNet MATH Google Scholar
L. B. Klebanov, I. Volchenkova. Outliers and the ostensibly heavy tails. https://arxiv.org.abs/1807.08715vl.
G. Marsaglia, W. W. Tsang. A simple method for generating gamma variables. ACM Transactions on Mathematical Software, vol. 26, no. 3, pp. 363–372, 2000. DOI: https://doi.org/10.1145/358407.358414.
MathSciNet MATH Google Scholar
N. L. Johnson, S. Kotz, N. Balakrishnan. Continuous Univariate Distributions, 2nd ed., New York, USA: Wiley, 1995.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Control and Computation Engineering, Warsaw University of Technology, Warsaw, 00-665, Poland
Paweł D. Domański

Authors

Paweł D. Domański
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paweł D. Domański.

Additional information

Pawel D. Domański received the M.Sc. degree, Ph.D. degree and D.Sc. degree in control engineering from Faculty of Electronics and Information Technology, Warsaw University of Technology, Poland in 1967, 1991 and 1996, respectively. He works in the Institute of Control and Computational Engineering, Warsaw University of Technology, Poland from 1991. He is the author of one book and more than 100 publications. Apart from scientific research, he participated in dozens of industrial implementations of advanced process control and optimization in power and chemical industries all over the world.

His research interests include industrial advanced process control applications, control performance quality assessment and optimization.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domański, P.D. Study on Statistical Outlier Detection and Labelling. Int. J. Autom. Comput. 17, 788–811 (2020). https://doi.org/10.1007/s11633-020-1243-2

Download citation

Received: 03 April 2020
Accepted: 29 June 2020
Published: 21 October 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11633-020-1243-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study on Statistical Outlier Detection and Labelling

Abstract

Access this article

Similar content being viewed by others

Review on model predictive control: an engineering perspective

A review on fault detection and diagnosis techniques: basics and beyond

A survey of methods for time series change point detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Study on Statistical Outlier Detection and Labelling

Abstract

Access this article

Similar content being viewed by others

Review on model predictive control: an engineering perspective

A review on fault detection and diagnosis techniques: basics and beyond

A survey of methods for time series change point detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation