Skip to main content

Multi-domain Active Learning for Semi-supervised Anomaly Detection

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Abstract

Active learning aims to ease the burden of collecting large amounts of annotated data by intelligently acquiring labels during the learning process that will be most helpful to learner. Current active learning approaches focus on learning from a single dataset. However, a common setting in practice requires simultaneously learning models from multiple datasets, where each dataset requires a separate learned model. This paper tackles the less-explored multi-domain active learning setting. We approach this from the perspective of multi-armed bandits and propose the active learning bandits (Alba) method, which uses bandit methods to both explore and exploit the usefulness of querying a label from different datasets in subsequent query rounds. We evaluate our approach on a benchmark of 7 datasets collected from a retail environment, in the context of a real-world use case of detecting anomalous resource usage. Alba outperforms existing active learning strategies, providing evidence that the standard active learning approaches are less suitable for the multi-domain setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Appendix & Code: https://github.com/Vincent-Vercruyssen/ALBA-paper.

  2. 2.

    Because AL typically operates with a small budget and large datasets, this assumption is reasonable as the marginal benefit of labeling each additional instance is small.

  3. 3.

    Section 5 provides empirical evidence that the random selection strategy indeed leads to better results than the heuristic strategy. Note that the proof relies on the heuristic strategy being able to rank the instances correctly according to their informativeness. In reality, this ranking is approximate.

  4. 4.

    Online Appendix 7.2 has detailed information on the (choice of) evaluation metrics, benchmark data, and hyperparameters. It also has additional results on the impact of the dataset characteristics on Alba’s performance.

  5. 5.

    We use 8 statistical (average, standard deviation, max, min, median, sum, entropy, skewness, and Kurtosis) and 2 binary features (whether its a Friday or a Sunday), 10 in total.

  6. 6.

    See online Appendix 7.3 for the plots for all 54 benchmark datasets.

  7. 7.

    All the experimental evaluations maintain a precision of 1e–4 and a threshold of 0.001 (e.g., to determine the similarity of two AULC scores).

  8. 8.

    See online Appendix 7.3 for a more detailed discussion.

References

  1. Acharya, A., Mooney, R.J., Ghosh, J.: Active multi-task learning using both latent and supervised shared topics. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 190–198 (2014)

    Google Scholar 

  2. Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning 5(1), 1–122 (2012)

    Google Scholar 

  3. Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining Knowl. Discovery 30(4), 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8

    Article  MathSciNet  Google Scholar 

  4. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  5. Desreumaux, L., Lemaire, V.: Learning active learning at the crossroads? evaluation and discussion. arXiv preprint arXiv:2012.09631 (2020)

  6. Fang, M., Tao, D.: Active multi-task learning via bandits. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 505–513 (2015)

    Google Scholar 

  7. Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 1–35 (2016)

    MathSciNet  MATH  Google Scholar 

  8. Ganti, R., Gray, A.: Building bridges: viewing active learning from the multi-armed bandit lens. arXiv preprint arXiv:1309.6830 (2013)

  9. He, R., He, S., Tang, K.: Multi-domain active learning: a comparative study. arXiv preprint arXiv:2106.13516 (2021)

  10. Hsu, W.N., Lin, H.T.: Active learning by learning. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  11. Konyushkova, K., Sznitman, R., Fua, P.: Learning active learning from data. arXiv preprint arXiv:1703.03365 (2017)

  12. Levine, N., Crammer, K., Mannor, S.: Rotting bandits. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  13. Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Machine Learning Proceedings, pp. 148–156. Morgan Kaufmann (1994)

    Google Scholar 

  14. Li, L., Jin, X., Pan, S.J., Sun, J.T.: Multi-domain active learning for text classification. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1086–1094 (2012)

    Google Scholar 

  15. Perini, L., Vercruyssen, V., Davis, J.: Class prior estimation in active positive and unlabeled learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020), pp. 2915–2921. IJCAI-PRICAI (2020)

    Google Scholar 

  16. Perini, L., Vercruyssen, V., Davis, J.: Transferring the contamination factor between anomaly detection domains by shape similarity. In: Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (2021)

    Google Scholar 

  17. Reichart, R., Tomanek, K., Hahn, U., Rappoport, A.: Multi-task active learning for linguistic annotations. In: Proceedings of ACL-08: HLT, pp. 861–869 (2008)

    Google Scholar 

  18. Roy, N., McCallum, A.: Toward optimal active learning through monte-carlo estimation of error reduction. In: Proceedings of the 18th International Conference on Machine Learning, pp. 441–448 (2001)

    Google Scholar 

  19. Sener, O., Savarese, S.: Active learning for convolutional neural networks: a core-set approach. arXiv preprint arXiv:1708.00489 (2017)

  20. Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)

    MathSciNet  MATH  Google Scholar 

  21. Seung, H., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)

    Google Scholar 

  22. Sinha, S., Ebrahimi, S., Darrell, T.: Variational adversarial active learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5972–5981 (2019)

    Google Scholar 

  23. Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)

    Article  MATH  Google Scholar 

  24. Trittenbach, H., Englhardt, A., Böhm, K.: An overview and a benchmark of active learning for outlier detection with one-class classifiers. Expert Syst. Appl. 168, 114372 (2021)

    Article  Google Scholar 

  25. Vercruyssen, V., Meert, W., Verbruggen, G., Maes, K., Bäumer, R., Davis, J.: Semi-supervised anomaly detection with an application to water analytics. In: Proceedings of the IEEE International Conference on Data Mining, pp. 527–536 (2018)

    Google Scholar 

  26. Wang, W., Zhou, Z.H.: On multi-view active learning and the combination with semi-supervised learning. In: Proceedings of the 25th International Conference on Machine learning, pp. 1152–1159 (2008)

    Google Scholar 

  27. Wei, K., Yang, Y., Zuo, H., Zhong, D.: A review on ice detection technology and ice elimination technology for wind turbine. Wind Energy 23(3), 433–457 (2020)

    Article  Google Scholar 

  28. Xiao, Y., Chang, Z., Liu, B.: An efficient active learning method for multi-task learning. Knowledge-Based Syst. 190, 105137 (2020)

    Article  Google Scholar 

  29. Yoo, D., Kweon, I.S.: Learning loss for active learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 93–102 (2019)

    Google Scholar 

  30. Zhang, Y., Yang, Q.: A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017)

  31. Zhang, Z., Jin, X., Li, L., Ding, G., Yang, Q.: Multi-domain active learning for recommendation. In: 30th AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the Flemish government “Onderzoeksprogramma Artificiële Intelligentie Vlaanderen”, “Agentschap Innoveren & Ondernemen (VLAIO)” as part of the innovation mandate HBC.2020.2297, the FWO-Vlaanderen aspirant grant 1166222N, and Leuven.AI, B-3000 Leuven, Belgium.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Vercruyssen .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1729 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vercruyssen, V., Perini, L., Meert, W., Davis, J. (2023). Multi-domain Active Learning for Semi-supervised Anomaly Detection. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26412-2_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26411-5

  • Online ISBN: 978-3-031-26412-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics