Skip to main content
Log in

Unsupervised modeling and feature selection of sequential spherical data through nonparametric hidden Markov models

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

As spherical data (i.e. \(L_2\) normalized vectors) are often encountered in a variety of real-life applications (such as gesture recognition, gene expression analysis, etc.), sequential spherical data modeling has become an important research topic in recent years. Hidden Markov models (HMMs), as probabilistic graph models, have shown their effectiveness in modeling sequential data in previous research works. In this article, we propose a nonparametric hidden Markov model (NHMM) for modeling time series or sequential spherical data vectors. In our model, the emission distribution of each hidden state obeys a mixture of von Mises (VM) distributions which has better capability for modeling spherical data than other popular distributions (e.g. the Gaussian distribution). As we construct our NHMM by leveraging a Bayesian nonparametric model namely the Dirichlet process, the amount of hidden states and the number of mixture components for each state can be automatically adjusted according to observed data set. In addition, to handle high-dimensional data sets which may contain irrelevant or noisy features, feature selection, which is the process of selecting the “best” feature subset for describing the given data set, is adopted in our framework. In our case, an unsupervised localized feature selection method is incorporated with the developed NHMM, which results in a unified framework that can simultaneously perform data modeling and feature selection. Our model is learned by theoretically developing a convergence-guaranteed algorithm through variational Bayes. The advantages of our model are demonstrated by conducting experiments on both synthetic and real-world sequential data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability statement

The data sets analysed during the current study are available in the UCI Machine Learning Repository https://archive.ics.uci.edu.

Notes

  1. https://archive.ics.uci.edu.

References

  1. Asilian Bidgoli A, Ebrahimpour-komleh H, Rahnamayan S (2021) A novel binary many-objective feature selection algorithm for multi-label data classification. Int J Mach Learn Cybern 12:2041–2057

    Article  MATH  Google Scholar 

  2. Aytekin C, Ni X, Cricri F, Aksu E (2018) Clustering and unsupervised anomaly detection with \(l_2\) normalized deep auto-encoder representations. In: 2018 international joint conference on neural networks (IJCNN), pp 1–6

  3. Banerjee A, Dhillon I, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382

    MathSciNet  MATH  Google Scholar 

  4. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  5. Blei DM, Jordan MI (2005) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–144

    MathSciNet  MATH  Google Scholar 

  6. Blei DM, Kucukelbir A, Mcauliffe J (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859–877

    Article  MathSciNet  Google Scholar 

  7. Calderara S, Prati A, Cucchiara R (2011) Mixtures of von Mises distributions for people trajectory shape analysis. IEEE Trans Circ Syst Video Technol 21(4):457–471

    Article  Google Scholar 

  8. Chatzis SP, Kosmopoulos DI (2011) A variational Bayesian methodology for hidden Markov models utilizing student’s-t mixtures. Pattern Recogn 44(2):295–306

    Article  MATH  Google Scholar 

  9. Ding N, Ou Z (2010) Variational nonparametric Bayesian hidden markov model. In: 2010 IEEE international conference on acoustics, speech and signal processing, pp 2098–2101

  10. Dokeroglu T, Deniz A, Kiziloz HE (2021) A robust multiobjective harris’ hawks optimization algorithm for the binary classification problem. Knowl-Based Syst 227(107):219

    Google Scholar 

  11. Epaillard E, Bouguila N (2019) Variational Bayesian learning of generalized Dirichlet-based hidden Markov models applied to unusual events detection. IEEE Trans Neural Netw 30(4):1034–1047

    Article  MathSciNet  Google Scholar 

  12. Fan W, Bouguila N (2020) Spherical data clustering and feature selection through nonparametric Bayesian mixture models with von Mises distributions. Eng Appl Artif Intell 94(103):781

    Google Scholar 

  13. Fan W, Bouguila N, Ziou D (2011) Unsupervised anomaly intrusion detection via localized Bayesian feature selection. In: 2011 IEEE 11th international conference on data mining (ICDM), pp 1032–1037

  14. Fan W, Bouguila N, Du J, Liu X (2019) Axially symmetric data clustering through Dirichlet process mixture models of Watson distributions. IEEE Trans Neural Netw Learn Syst 30(6):1683–1694

    Article  MathSciNet  Google Scholar 

  15. Fan W, Yang L, Bouguila N, Chen Y (2020) Sequentially spherical data modeling with hidden Markov models and its application to fMRI data analysis. Knowl-Based Syst 206(106):341

    Google Scholar 

  16. Fan W, Yang L, Bouguila N (2021) Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions. IEEE Trans Pattern Anal Mach Intell 2021:1–1. https://doi.org/10.1109/TPAMI.2021.3128271

    Article  Google Scholar 

  17. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  18. Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347

    MathSciNet  MATH  Google Scholar 

  19. Illingworth CJR, Roy S, Beale MA, Tutill HJ, Williams R, Breuer J (2017) On the effective depth of viral sequence data. Virus Evol 3:2

    Article  Google Scholar 

  20. Javidi MM (2021) Feature selection schema based on game theory and biology migration algorithm for regression problems. Int J Mach Learn Cybern 12:303–342

    Article  Google Scholar 

  21. Ji S, Krishnapuram B, Carin L (2006) Variational Bayes for continuous hidden Markov models and its application to active learning. IEEE Trans Pattern Anal Mach Intell 28(4):522–532

    Article  Google Scholar 

  22. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233

    Article  MATH  Google Scholar 

  23. Kingma DP, Welling M (2014) Auto-encoding variational Bayes. In: ICLR

  24. Law MHC, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166

    Article  Google Scholar 

  25. Ley C, Verdebout T (2018) Applied directional statistics: modern methods and case studies. Chapman and Hall/CRC, Hoboken

    Book  MATH  Google Scholar 

  26. Li J, Cheng K, Wang S, Morstatter F, Trevino R, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv 50(6):94

    Google Scholar 

  27. Li Y, Dong M, Hua J (2009) Simultaneous localized feature selection and model detection for Gaussian mixtures. IEEE Trans Pattern Anal Mach Intell 31(5):953–960

    Article  Google Scholar 

  28. Mabrouk AB, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems. Expert Syst Appl 91:480–491

    Article  Google Scholar 

  29. Mardia KV, Jupp PE (2000) Directional statistics. Wiley, USA

    MATH  Google Scholar 

  30. Nasfi R, Amayri M, Bouguila N (2020) A novel approach for modeling positive vectors with inverted Dirichlet-based hidden Markov models. Knowl Based Syst 192(105):335

    Google Scholar 

  31. Pigou L, Den Oord AV, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vis 126:430–439

    Article  MathSciNet  Google Scholar 

  32. Qiu Z, Shen H (2017) User clustering in a dynamic social network topic model for short text streams. Inf Sci 414:102–116

    Article  Google Scholar 

  33. Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16

    Article  Google Scholar 

  34. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):267–296

    Article  Google Scholar 

  35. Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sin 4:639–650

    MathSciNet  MATH  Google Scholar 

  36. Sra S, Karp D (2013) The multivariate Watson distribution: Maximum-likelihood estimation and other aspects. J Multivar Anal 114:256–269

    Article  MathSciNet  MATH  Google Scholar 

  37. Taghia J, Leijon A (2016) Variational inference for Watson mixture model. IEEE Trans Pattern Anal Mach Intell 38(9):1886–1900

    Article  Google Scholar 

  38. Taghia J, Ma Z, Leijon A (2014) Bayesian estimation of the von Mises-fisher mixture model with variational inference. IEEE Trans Pattern Anal Mach Intell 36(9):1701–1715

    Article  Google Scholar 

  39. Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476):1566–1581

    Article  MathSciNet  MATH  Google Scholar 

  40. Tubishat M, Ja’afar S, Alswaitti M, Mirjalili S, Idris N, Ismail MA, Omar MS (2021) Dynamic salp swarm algorithm for feature selection. Expert Syst Appl 164(113):873

    Google Scholar 

  41. Volant S, Berard C, Martinmagniette M, Robin S (2014) Hidden markov models with mixtures as emission distributions. Stat Comput 24(4):493–504

    Article  MathSciNet  MATH  Google Scholar 

  42. Zheng Y, Jeon B, Sun L, Zhang J, Zhang H (2018) Student’s t-hidden Markov model for unsupervised learning using localized feature selection. IEEE Trans Circuits Syst Video Technol 28(10):2586–2598

    Article  Google Scholar 

  43. Zhu H, He Z, Leung H (2012) Simultaneous feature and model selection for continuous hidden markov models. IEEE Signal Process Lett 19(5):279–282

    Article  Google Scholar 

Download references

Acknowledgements

The completion of this work was supported by the National Natural Science Foundation of China (61876068).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wentao Fan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, W., Hou, W. Unsupervised modeling and feature selection of sequential spherical data through nonparametric hidden Markov models. Int. J. Mach. Learn. & Cyber. 13, 3019–3029 (2022). https://doi.org/10.1007/s13042-022-01579-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01579-7

Keywords

Navigation