Skip to main content

Online Data Clustering Using Variational Learning of a Hierarchical Dirichlet Process Mixture of Dirichlet Distributions

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8505))

Included in the following conference series:

Abstract

This paper proposes an online clustering approach based on both hierarchical Dirichlet processes and Dirichlet distributions. The deployment of hierarchical Dirichlet processes allows to resolve difficulties related to model selection thanks to its nonparametric nature that arises in the face of unknown number of mixture components. The consideration of the Dirichlet distribution is justified by its high flexibility for non-Gaussian data modeling as shown in several previous works. The resulting statistical model is learned using variational Bayes and is evaluated via a challenging application namely images clustering. The obtained results show the merits of the proposed statistical framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Other state-of-the-art local visual descriptors may provide better results, however, this is not the focus of this work.

  2. 2.

    Database available at: http://vision.stanford.edu/aditya86/ImageNetDogs.

References

  1. Bishop, C.M.: Pattern Recognition and Machine Learning. Springe, New York (2006)

    MATH  Google Scholar 

  2. Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2005)

    Article  MathSciNet  Google Scholar 

  3. Bouguila, N., Wang, J.H., Hamza, A.B.: Software modules categorization through likelihood and bayesian analysis of finite dirichlet mixtures. J. Appl. Stat. 37(2), 235–252 (2010)

    Article  MathSciNet  Google Scholar 

  4. Bouguila, N., Ziou, D.: A dirichlet process mixture of dirichlet distributions for classification and prediction. In: Proceedings of the IEEE Workshop on Machine Learning for Signal Processing (MLSP), pp. 297–302. IEEE (2008)

    Google Scholar 

  5. Bouguila, N.: Infinite liouville mixture models with application to text and texture categorization. Pattern Recogn. Lett. 33(2), 103–110 (2012)

    Article  Google Scholar 

  6. Bouguila, N., Ziou, D.: Using unsupervised learning of a finite dirichlet mixture model to improve pattern recognition applications. Pattern Recogn. Lett. 26(12), 1916–1925 (2005)

    Article  Google Scholar 

  7. Bouguila, N., Ziou, D.: Online clustering via finite mixtures of dirichlet and minimum message length. Eng. Appl. Artif. Intell. 19(4), 371–379 (2006)

    Article  Google Scholar 

  8. Boyd-Graber, J.L., Blei, D.M.: Syntactic topic models. In: NIPS, pp. 185–192. Curran Associates, Inc. (2008)

    Google Scholar 

  9. Bradley, P.S., Fayyad, U., Reina, C.A.: Clustering very large databases using em mixture models. In: Proceedings of ICPR, vol. 2, pp. 76–80. IEEE (2000)

    Google Scholar 

  10. Carbonetto, P., Kisynski, J., de Freitas, N., Poole, D.: Nonparametric bayesian logic. In: Proceedings of UAI, pp. 85–93 (2005)

    Google Scholar 

  11. Caron, F., Davy, M., Doucet, A., Duflos, E., Vanheeghe, P.: Bayesian inference for linear dynamic models with dirichlet process mixtures. IEEE Trans. Sign. Proces. 56(1), 71–84 (2008)

    Article  MathSciNet  Google Scholar 

  12. Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–12. Springer (2004)

    Google Scholar 

  13. Doshi, F., Miller, K., Gael, J.V., Teh, Y.W.: Variational inference for the indian buffet process. J. Mach. Learn. Res. Proc. Track 5, 137–144 (2009)

    Google Scholar 

  14. Fan, W., Bouguila, N., Ziou, D.: Variational learning for finite Dirichlet mixture models and applications. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 762–774 (2012)

    Article  Google Scholar 

  15. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)

    Article  Google Scholar 

  16. Ferguson, T.S.: Bayesian density estimation by mixtures of normal distributions. Recent Adv. Stat. 24, 287–302 (1983)

    Article  MathSciNet  Google Scholar 

  17. Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: Proceedings of ICCV, pp. 1–8. IEEE (2007)

    Google Scholar 

  18. Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96, 161–173 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  19. Jin, L.C., Wan, W.G., Cui, B., Yu, X.Q.: A new multimedia classification approach: Bayesian of inductive cognition algorithm based on dirichlet process. Imaging Sci. J. 58(6), 331–339 (2010)

    Article  Google Scholar 

  20. Jin, Y., Khan, L., Wang, L., Awad, M.: Image annotations by combining multiple evidence and wordnet. In: Proceedings of the 13th ACM International Conference on Multimedia, pp. 706–715 (2005)

    Google Scholar 

  21. Korwar, R.M., Hollander, M.: Contributions to the theory of Dirichlet processes. Ann. Probab. 1, 705–711 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  22. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  23. Malisiewicz, T., Efros, A.A.: Improving spatial support for objects via multiple segmentations. In: Proceedings of BMVC, pp. 1–10 (2007)

    Google Scholar 

  24. Malisiewicz, T., Efros, A.A.: Recognition by association via learning per-exemplar distances. In: Proceedings of CVPR, pp. 1–8. IEEE (2008)

    Google Scholar 

  25. Nott, D.J.: Predictive performance of dirichlet process shrinkage methods in linear regression. Comput. Stat. Data Anal. 52(7), 3658–3669 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  26. Opper, M., Winther, O.: Gaussian processes for classification: mean-field algorithms. Neural Comput. 12(11), 2655–2684 (2000)

    Article  Google Scholar 

  27. Quattoni, A., Collins, M., Darrell, T.: Transfer learning for image classification with sparse prototype representations. In: Proceedings of CVPR, pp. 1–8. IEEE (2008)

    Google Scholar 

  28. Sato, M.: Online model selection based on the variational Bayes. Neural Comput. 13, 1649–1681 (2001)

    Article  MATH  Google Scholar 

  29. Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)

    MATH  MathSciNet  Google Scholar 

  30. Teh, Y.W., Jordan, M.I.: Hierarchical Bayesian nonparametric models with applications. In: Hjort, N., Holmes, C., Müller, P., Walker, S. (eds.) Bayesian Nonparametrics: Principles and Practice, pp. 158–207. Cambridge University Press (2010)

    Google Scholar 

  31. Teh, Y.W., Görür, D., Ghahramani, Z.: Stick-breaking construction for the indian buffet process. J. Mach. Learn. Res. Proc. Track 2, 556–563 (2007)

    Google Scholar 

  32. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  33. Volkmer, T., Smith, J.R., Natsev, A.: A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. In: Proceedings of the 13th ACM International Conference on Multimedia, pp. 892–901 (2005)

    Google Scholar 

  34. Wang, C., Paisley, J.W., Blei, D.M.: Online variational inference for the hierarchical Dirichlet process. J. Mach. Learn. Res. Proc. Track 15, 752–760 (2011)

    Google Scholar 

  35. Zhang, W., Yu, B., Zelinsky, G.J., Samaras, D.: Object class recognition using multiple layer boosting with heterogeneous features. In: Proceedings of the CVPR, pp. 323–330. IEEE (2005)

    Google Scholar 

Download references

Acknowledgment

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nizar Bouguila .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fan, W., Bouguila, N. (2014). Online Data Clustering Using Variational Learning of a Hierarchical Dirichlet Process Mixture of Dirichlet Distributions. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43984-5_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43983-8

  • Online ISBN: 978-3-662-43984-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics