Abstract
In this paper, we propose a hierarchical Bayesian model, an improved hierarchical Dirichlet process-hidden Markov model (iHDP-HMM), for visual document analysis. The iHDP-HMM is capable of clustering visual documents and capturing the temporal correlations between the visual words within a visual document while identifying the number of document clusters and the number of visual topics adaptively. A Bayesian inference mechanism for the iHDP-HMM is developed to carry out likelihood evaluation, topic estimation, and cluster membership prediction. We apply the iHDP-HMM to simultaneously cluster motion trajectories and discover latent topics for trajectory words, based on the proposed method for constructing the trajectory word codebook. Then, an iHDP-HMM-based probabilistic trajectory retrieval framework is developed. The experimental results verify the clustering accuracy of the iHDP-HMM and trajectory retrieval accuracy of the proposed framework.























Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Alon, J., Sclaroff, S., Kollios, G., Pavlovic, V. (2003). Discovering clusters in motion time-series data. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Vol. 1, pp. 375–381).
Atev, S., Miller, G., & Papanikolopoulos, N. P. (2010). Clustering of vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems, 11(3), 647–657.
Bashir, F., Khokhar, A., Schonfeld, D. (2004). A hybrid system for affine-invariant trajectory retrieval. In Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval (pp. 235–242).
Bashir, F. I., Khokhar, A. A., & Schonfeld, D. (2007). Real-time motion trajectory-based indexing and retrieval of video sequences. IEEE Transactions on Multimedia, 9(1), 58–65.
Beal, M.J., Ghahramani, Z., Rasmussen, C. (2002). The infinite hidden Markov model. In Proceedings of Annual Conference on Neural Information Processing Systems (Vol. 14, pp. 577–584).
Beal, M.J., Krishnamurthy, P. (2006). Gene expression time course clustering with countably infinite hidden Markov models. In Proceedings of Annual Conference on Uncertainty in Artificial Intelligence (pp. 23–30).
Blackwell, D., & Macqueen, J. B. (1973). Ferguson distribution via polya urn schemes. The Annals of Statistics, 1(2), 353–355.
Blei, D.M., Jordan, M.I. (2004). Variational methods for the Dirichlet process. In Proceedings of International Conference on Machine Learning (pp. 121–144).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Chen L., Ozsu M.T., Oria V. (2004). Symbolic representation and retrieval of moving object trajectories. In Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval (pp. 227–234).
Chen, L., Ozsu, M.T., Oria, V. (2005). Robust and fast similarity search for moving object trajectories. In Proceedings of ACM International Conference on Management of Data (pp. 491–502).
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Dimitrova, N., & Golshani, F. (1995). Motion recovery for video content classification. ACM Transactions on Information System, 13(14), 408–439.
Dyana, A., Das, S. (2007). Spatio-temporal descriptor using 3D curvature scale space. In Proceedings of International Conference on Pattern Recognition and Machine Intelligence (pp. 632–640).
Dyana, A., Subramanian, M.P., Das, S. (2009). Combining reatures for shape and motion trajectory of video objects for efficient content based video retrieval. In Proceedings of International Conference on Advances in Pattern Recognition (pp. 113–116).
Dyana, A., & Das, S. (2010). MST-CSS (multi-spectro-temporal curvature scale space), a novel spatio-temporal representation for content-based video retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 20(8), 1080–1094.
Ferguson, T. (1973). A Bayesian analysis of some non-parametric problems. The Annals of Statistics, 1(2), 209–230.
Fox, E. B., Sudderth, E. B., Jordan, M. I., & Willsky, A. S. (2008). An HDP-HMM for systems with state persistence. In Proceedins of International Conference on Machine Learning (pp. 312–319). Finland: Helsinki.
Georgescu, B., Shimshoni, I., Meer, P. (2003). Mean shift based clustering in high dimensions: A texture classification example. In Proceedings of IEEE International Conference on Computer Vision (Vol. 1, pp. 456–463).
Hsieh, J., Yu, S., & Chen, Y. (2006). Motion-based video retrieval by trajectory matching. IEEE Transactions on Circuits and Systems for Video Technology, 16(3), 396–409.
Jian, Y.-D., & Chen, C.-S. (2010). Two-view motion segmentation with model selection and outlier removal by Ransac-enhanced Dirichlet process mixture models. International Journal of Computer Vision, 88(3), 489–501.
Johnson, N., & Hogg, D. (1996). Learning the distribution of object trajectories for event recognition. Image and Vision Computing, 14(8), 609–615.
Jung, C. R., Hennemann, L., & Musse, S. R. (2008). Event detection using trajectory clustering and 4-D histograms. IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1565– 1575.
Keogh, E.J., Pazzani, M.J. (2000). Scaling up dynamic time warping for datamining applications. In Proceedings of International Conference on Knowledge Discovery and Data Mining (pp. 285–289).
Kivinen, J.J., Sudderth, E.B., Jordan, M.I. (2007). Learning multiscale representations of natural scenes using Dirichlet processes. In Proceedings of IEEE International Conference on Computer Vision (pp. 1–8).
Kuettel, D., Breitenstein, M.D., Gool, L.V., Ferrari, V. (2010). What’s going on? Discovering spatio-temporal dependencies in dynamic scenes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1951–1958).
Le, T.-L., Boucher, A., Thonnat, M. (2006). Trajectory-based video indexing and retrieval enabling relevance feedback. In Proceedings of International Conference on Communications and Electronics (pp. 1–6).
Le, T.-L., Boucher, A., Thonnat, M. (2007). Subtrajectory-based video indexing and retrieval. In Proceedings of International Multimedia Modeling Conference (pp. 418–427). Singapore.
Li, X., Hu, W.M., Zhang, Z.F., Zhang, X.Q., Luo, G. (2008). Trajectory-based video retrieval using Dirichlet process mixture models. In Proceedings of British Machine Vision Conference (pp. 1–10). UK: Leeds.
Li, F.-F., Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 524–531).
Li, L., Wang, G., Li, F.-F. (2007) OPTIMOL: Automatic online picture collection via incremental model learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8).
Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.
Little, J. J., Gu, Z. (2001). Video retrieval by spatial and temporal structure of trajectories. In Proceedings SPIE Storage and Retrieval for Media Databases (Vol. 4315, pp. 545–552).
Liu, C.-L., Zhou, X.-D. (2006). Online Japanese character recognition using trajectory-based normalization and direction feature extraction. In Proceedings of International Workshop on Frontiers in Handwriting Recognition (pp. 217–222). France: La Baule.
Liu, C.-L., Jaeger, S., & Nakagawa, M. (2004). Online recognition of Chinese characters: The state-of-the-art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 198–213.
Ma, X., Bashir, F., Khokhar, A. A., & Schonfeld, D. (2009). Event analysis based on multiple interactive motion trajectories. IEEE Transactions on Circuits and Systems for Video Technology, 19(3), 397–406.
Maceachern, S. N., & Muller, P. (1998). Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics, 7(2), 223–238.
Morris, B.T., Trivedi, M.M. (2009). Learning trajectory patterns by clustering: Experimental studies and comparative evaluation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 312–319).
Morris, B. T., & Trivedi, M. M. (2008). Learning, modeling, and classification of vehicle track patterns from live video. IEEE Transactions on Intelligent Transportation Systems, 9(3), 425–437.
Morris, B. T., & Trivedi, M. M. (2008). A survey of vision-based trajectory learning and analysis for surveillance. IEEE Transactions on Circuits and Systems for Video Technology, 18(8), 1114–1127.
Naftel, A., Khalid, S. (2006). Motion trajectory learning in the DFT-coefficient feature space. In Proceedings of IEEE International Conference on Computer Vision Systems (pp. 47–47), Jan 2006.
Neal, R. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2), 249–265.
Niebles, J., Wang, H. C., & Li, F.-F. (2008). Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3), 299–318.
Piotto, N., Conci, N., & De Natale, F. G. B. (2009). Syntactic matching of trajectories for ambient intelligence applications. IEEE Transactions on Multimedia, 11(7), 1266–1275.
Sahouria, E. (1997). Video Indexing Based on Object Motion. M.S. Thesis, Department of Electrical Engineering and Computer Science, University of California, Berkeley.
Saleemi, I., Shafique, K., & Shah, M. (2009). Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1472–1485.
Shim, C.-B., Chang, J.-W. (2000). Spatio-temporal representation and retrieval using moving object’s trajectories. In Proceedings of ACM Workshops on Multimedia (pp. 209–212).
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T. (2005). Discovering objects and their location in images. In Proceedings of IEEE International Conference on Computer Vision (Vol. 1, pp. 370–377).
Sun, J., Zhang, W., Tang, X., Shum, H. (2005). Bidirectional tracking using trajectory segment analysis. In Proceedings of IEEE International Conference on Computer Vision (Vol. 1, pp. 717–724).
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. (2005). Sharing clusters among related groups: Hierarchical Dirichlet processes. In Proceedings of Annual Conference on Neural Information Processing Systems (pp. 1385–1392).
Teh, Y., Jordan, M., Beal, M., & Blei, D. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566–1581.
Veeraraghavan, H., & Papanikolopoulos, N. P. (2009). Learning to recognize video-based spatiotemporal events. IEEE Transactions on Intelligent Transportation Systems, 10(4), 628–638.
Vlachos, M., Kollios, G., Gunopulos, D. (2002). Discovering similar multidimensional trajectories. In Proceedings of International Conference on Data Engineering (pp. 673–684).
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., & Keogh, E. (2006). Indexing multidimensional time-series. International Journal on Very Large Data Bases, 15(1), 1–20.
Wang, X., Grimson, E. (2007). Spatial latent Dirichlet allocation. In Proceedings of Annual Conference on Neural Information Processing Systems (pp. 1–8).
Wang, X., Ma, X., Grimson, E. (2007). Unsupervised activity perception by hierarchical Bayesian models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8).
Wang, X., Tieu, K., Grimson, E. (2006). Learning semantic scene models by trajectory analysis. In Proceedings of European Conference on Computer Vision (Vol. 3, pp. 110–123).
Wang, G., Zhang, Y., Li, F.-F. (2006). Using dependent regions for object categorization in a generative framework. In Proceedings of Computer Vision and Pattern Recognition (Vol. 2, pp. 1597–1604).
Zhang, Z., Huang, K., Tan, T. (2006). Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes. In Proceedings of IEEE International Conference on Pattern Recognition (pp. 1135–1138).
Zhang, C., Zhu, S., Gong, Y. (2006). Trend analysis for large document streams. In Proceedings of International Conference on Machine Learning and Applications (pp. 285–295).
Zhu, X., Ghahramani, Z., Lafferty, J. (2005). Time-sensitive Dirichlet process mixture models. Technical Report CMUCALD-05-104, School of Computer Science, Carnegie Mellon University.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hu, W., Tian, G., Li, X. et al. An Improved Hierarchical Dirichlet Process-Hidden Markov Model and Its Application to Trajectory Modeling and Retrieval. Int J Comput Vis 105, 246–268 (2013). https://doi.org/10.1007/s11263-013-0638-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0638-8