Semi-supervised Learning from Active Noisy Soft Labels for Anomaly Detection

Martens, Timo; Perini, Lorenzo; Davis, Jesse

doi:10.1007/978-3-031-43412-9_13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14169))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1346 Accesses
1 Citations

Abstract

Anomaly detection aims at detecting examples that do not conform to normal behavior. Increasingly, anomaly detection is being approached from a semi-supervised perspective where active learning is employed to acquire a small number of strategically selected labels. However, because anomalies are not always well-understood events, the user may be uncertain about how to label certain instances. Thus, one can relax this request and allow the user to provide soft labels (i.e., probabilistic labels) that represent their belief that a queried example is anomalous. These labels are naturally noisy due to the user’s inherent uncertainty in the label and the fact that people are known to be bad at providing well-calibrated probability instances. To cope with these challenges, we propose to exploit a Gaussian Process to learn from actively acquired soft labels in the context of anomaly detection. This enables leveraging information about nearby examples to smooth out possible noise. Empirically, we compare our proposed approach to several baselines on 21 datasets and show that it outperforms them in the majority of experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The code and Supplement are available via https://github.com/TimoM99/SLADe.
2.
Results for \(0\%\) and \(10\%\) noise are, for completeness, in the Supplement.

References

Abe, N., Zadrozny, B., Langford, J.: Outlier detection by active learning. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 504–509. Springer (2006)
Google Scholar
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45681-3_2
Buhmann, M.D.: Radial basis functions. Acta Numer. 9, 1–38 (2000)
Article MathSciNet MATH Google Scholar
Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining Knowl. Discov. 30, 891–927 (2016)
Google Scholar
Ding, Y., Wang, L., Fan, D., Gong, B.: A semi-supervised two-stage approach to learning from noisy labels. In: 2018 IEEE Winter Conference on Applications of Computer Vision, pp. 1215–1224. IEEE (2018)
Google Scholar
Ebert, S., Fritz, M., Schiele, B.: Ralf: a reinforced active learning formulation for object class recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012)
Google Scholar
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)
Article Google Scholar
Fujimaki, R., Yairi, T., Machida, K.: An approach to spacecraft anomaly detection problem using kernel feature space. In: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 401–410. Association for Computing Machinery (2005)
Google Scholar
Griffin, D., Tversky, A.: The weighing of evidence and the determinants of confidence. Cognit. Psychol. 24(3), 411–435 (1992)
Article Google Scholar
Guthrie, D., Guthrie, L., Allison, B., Wilks, Y.: Unsupervised anomaly detection. In: 20th International Joint Conference on Artificial Intelligence, pp. 1624–1628. Morgan Kaufmann Publishers (2007)
Google Scholar
Hacohen, G., Dekel, A., Weinshall, D.: Active learning on a budget: opposite strategies suit high and low budgets. In: 39th International Conference on Machine Learning, pp. 8175–8195. PMLR (2022)
Google Scholar
Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: anomaly detection benchmark. Adv. Neural Inf. Process. Syst. 35, 32142–32159 (2022)
Google Scholar
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognit. Lett. 24(9–10), 1641–1650 (2003)
Article MATH Google Scholar
Huang, J., Ling, C.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)
Article Google Scholar
Juslin, P., Olsson, H., Winman, A.: The calibration issue: theoretical comments on suantak, bolger, and ferrell (1996). Organiz. Behav. Human Decis. Process. 73(1), 3–26 (1998)
Article Google Scholar
Kapoor, A., Grauman, K., Urtasun, R., Darrell, T.: Active learning with gaussian processes for object categorization. In: 11th IEEE International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Google Scholar
Kowalska, K., Peel, L.: Maritime anomaly detection using gaussian process active learning. In: 15th IEEE International Conference on Information Fusion, pp. 1164–1171. IEEE (2012)
Google Scholar
Kriegel, H.P., Kroger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: 2011 SIAM International Conference on Data Mining, pp. 13–24. SIAM (2011)
Google Scholar
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: 2003 SIAM International Conference on Data Mining, pp. 25–36. SIAM (2003)
Google Scholar
Lin, C.F., Wang, S.D.: Fuzzy support vector machines. IEEE Trans. Neural Netw. 13(2), 464–471 (2002)
Article Google Scholar
Littlestone, N., Warmuth, M.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)
Article MathSciNet MATH Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Google Scholar
Marteau, P.F., Soheily-Khah, S., Béchet, N.: Hybrid isolation forest-application to intrusion detection. arXiv preprint arXiv:1705.03800 (2017)
Monarch, R.M.: Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-Centered AI. Simon and Schuster (2021)
Google Scholar
Nassar, L., Karray, F.: Overview of the crowdsourcing process. Knowl. Inf. Syst. 60, 1–24 (2019)
Article Google Scholar
Nguyen, Q., Valizadegan, H., Hauskrecht, M.: Learning classification models with soft-label information. J. Am. Med. Inf. Assoc. 21(3), 501–508 (2014)
Article Google Scholar
Nguyen, Q., Valizadegan, H., Seybert, A., Hauskrecht, M.: Sample-efficient learning with auxiliary class-label information. In: 2011 AMIA Annual Symposium, pp. 1004–1012. American Medical Informatics Association (2011)
Google Scholar
Niaf, E., Flamary, R., Rouviere, O., Lartizien, C., Canu, S.: Kernel-based learning from both qualitative and quantitative labels: application to prostate cancer diagnosis based on multiparametric mr imaging. IEEE Trans. Image Process. 23(3), 979–991 (2013)
Article MathSciNet MATH Google Scholar
Pang, G., Shen, C., van den Hengel, A.: Deep anomaly detection with deviation networks. In: 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 353–362. Association for Computing Machinery (2019)
Google Scholar
Pang, G., Shen, C., Jin, H., Hengel, A.V.D.: Deep weakly-supervised anomaly detection. arXiv preprint arXiv:1910.13601 (2019)
Peng, P., Wong, R.C.W., Yu, P.S.: Learning on probabilistic labels. In: 2014 SIAM International Conference on Data Mining, pp. 307–315. SIAM (2014)
Google Scholar
Perini, L., Bürkner, P., Klami, A.: Estimating the contamination factor’s distribution in unsupervised anomaly detection. In: Fortieth International Conference on Machine Learning. PMLR (2023)
Google Scholar
Perini, L., Vercruyssen, V., Davis, J.: Class prior estimation in active positive and unlabeled learning. In: 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence, pp. 2915–2921. IJCAI-PRICAI (2020)
Google Scholar
Perini, L., Vercruyssen, V., Davis, J.: Transferring the contamination factor between anomaly detection domains by shape similarity. In: 36th AAAI Conference on Artificial Intelligence, pp. 4128–4136. AAAI Press (2022)
Google Scholar
Pimentel, T., Monteiro, M., Veloso, A., Ziviani, N.: Deep active learning for anomaly detection. In: 2020 IEEE International Joint Conference on Neural Networks, pp. 1–8. IEEE (2020)
Google Scholar
Pustokhina, I., Seraj, A., Hafsan, H., Mostafavi, S.M., Alizadeh, S.: Developing a robust model based on the gaussian process regression approach to predict biodiesel properties. Int. J. Chem. Eng. 1–12 (2021)
Google Scholar
Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4
Ratner, A., Hancock, B., Dunnmon, J., Goldman, R., Ré, C.: Snorkel metal: weak supervision for multi-task learning. In: Second Workshop on Data Management for End-to-End Machine Learning. Association for Computing Machinery (2018)
Google Scholar
Raykar, V.C., et al.: Learning from crowds. J. Mach. Learn. Res. 11(4) (2010)
Google Scholar
Ruff, L., et al.: Deep semi-supervised anomaly detection. arXiv preprint arXiv:1906.02694 (2019)
Russo, S., Lürig, M., Hao, W., Matthews, B., Villez, K.: Active learning for anomaly detection in environmental data. Environ. Model. Softw. 134, 104869 (2020)
Article Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article MATH Google Scholar
Thiel, C.: Classification on soft labels is robust against label noise. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5177, pp. 65–73. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85563-7_14
Vercruyssen, V., Meert, W., Verbruggen, G., Maes, K., Baumer, R., Davis, J.: Semi-supervised anomaly detection with an application to water analytics. In: 2018 IEEE International Conference on Data Mining, pp. 527–536. IEEE (2018)
Google Scholar
Vercruyssen, V., Perini, L., Meert, W., Davis, J.: Multi-domain active learning for semi-supervised anomaly detection. In: 2022 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 485–501. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-26412-2_30
Xuan, J., Lu, J., Zhang, G.: A survey on Bayesian nonparametric learning. ACM Comput. Surv. 52(1), 1–36 (2019)
Article Google Scholar
Zhao, Y., Hryniewicki, M.K.: Xgbod: improving supervised outlier detection with unsupervised representation learning. In: 2018 IEEE International Joint Conference on Neural Networks, pp. 1–8. IEEE (2018)
Google Scholar
Zhao, Z., et al.: Enhancing robustness of on-line learning models on highly noisy data. IEEE Trans. Depend. Secure Comput. 18(05), 2177–2192 (2021)
Google Scholar

Download references

Acknowledgment

This work is supported by the FWO-Vlaanderen (aspirant grant 1166222N to LP and G0D8819N to JD and TM) and the Flemish government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme (JD, LP).

Author information

Authors and Affiliations

DTAI Research Group and Leuven.AI, KULeuven, Leuven, Belgium
Timo Martens, Lorenzo Perini & Jesse Davis

Authors

Timo Martens
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Perini
View author publications
You can also search for this author in PubMed Google Scholar
Jesse Davis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Timo Martens .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martens, T., Perini, L., Davis, J. (2023). Semi-supervised Learning from Active Noisy Soft Labels for Anomaly Detection. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-43412-9_13
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Semi-supervised Learning from Active Noisy Soft Labels for Anomaly Detection