Abstract
Robust learning of mixture models in high dimensions remains an open challenge and especially so in current big data era. This paper investigates twelve variants of hybrid mixture models that combine the G-means clustering, Gaussian, and Student t-distribution mixture models for high-dimensional predictive modeling and anomaly detection. High-dimensional data is first reduced to lower-dimensional subspace using whitened principal component analysis. For real-time data processing in batch mode, a technique based on Gram-Schmidt orthogonalization process is proposed and demonstrated to update the reduced dimensions to remain relevant in fulfilling the task objectives. In addition, a model-adaptation technique is proposed and demonstrated for big data incremental learning by statistically matching the mixture components’ mean and variance vectors; the adapted parameters are computed based on weighted average that takes into account the sample size of new and older statistics with a parameter to scale down the influence of older statistics in each iterative computation. The hybrid models’ performance are evaluated using simulation and empirical studies. Results show that simple hybrid models without the Expectation-Maximization training step can achieve equally high performance in high dimensions that is comparable to the more sophisticated models. For unsupervised anomaly detection, the hybrid models achieve detection rate \(\gtrsim 90\%\) with injected anomalies from \(1\%\) to \(60\%\) using the KDD Cup 1999 network intrusion dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For reproducibility, the Matlab scripts to run the simulation and experimental studies in this paper are obtainable from https://github.com/jennbing/hybrid-models.
References
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California Irvine School of Information (2013). http://www.ics.uci.edu/mlearn/MLRepository.html
Barkan, O., Averbuch, A.: Robust mixture models for anomaly detection. In: IEEE International Workshop on Machine Learning for Signal Processing (2016)
Bishop, C.M.: Pattern recognition and machine learning. Pattern Recogn. 4(4), 738 (2006)
Chaudhuri, K., Dasgupta, S., Vattani, A.: Learning mixtures of Gaussians using the k-means algorithm, pp. 1–22 (2009). arXiv preprint arXiv:0912.0086
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)
Ge, R., Huang, Q., Kakade, S.M.: Learning mixtures of Gaussians in high dimensions. In: STOC 2015 (2015)
Hamerly, G., Elkan, C.: Learning the k in k-means. In: Neural Information Processing Systems, pp. 281–288 (2003)
Hoque, M.S., Mukit, M.A., Bikas, M.A.N., Sazzadul Hoque, M.: An implementation of intrusion detection system using genetic algorithm. Int. J. Netw. Secur. Appl. 4(2), 109–120 (2012)
Lafon, S.: Diffusion maps and geometric harmonics. Ph.D. thesis, Yale University, U.S.A, p. 97 (2004)
Peel, D., McLachlan, G.J.: Robust mixture modelling using the t distribution. Stat. Comput. 10(4), 339–348 (2000)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Proc. 10(1–3), 19–41 (2000)
Song, M., Wang, H.: Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering. Intell. Comput. Theory Appl. 5803, 174–183 (2005)
Vempala, S.S.: Technical perspective modeling high-dimensional data. Commun. ACM 55(2), 112 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ong, JB., Ng, WK. (2017). Hybrid Subspace Mixture Models for Prediction and Anomaly Detection in High Dimensions. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-69179-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69178-7
Online ISBN: 978-3-319-69179-4
eBook Packages: Computer ScienceComputer Science (R0)