skip to main content
10.1145/1743384.1743433acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

Topic models for semantics-preserving video compression

Published: 29 March 2010 Publication History

Abstract

Most state-of-the-art systems for content-based video understanding tasks require video content to be represented as collections of many low-level descriptors, e.g. as histograms of the color, texture or motion in local image regions. In order to preserve as much of the information contained in the original video as possible, these representations are typically high-dimensional, which conflicts with the aim for compact descriptors that would allow better efficiency and lower storage requirements.
In this paper, we address the problem of semantic compression of video, i.e. the reduction of low-level descriptors to a small number of dimensions while preserving most of the semantic information. For this, we adapt topic models - which have previously been used as compact representations of still images - to take into account the temporal structure of a video, as well as multi-modal components such as motion information.
Experiments on a large-scale collection of YouTube videos show that we can achieve a compression ratio of 20 : 1 compared to ordinary histogram representations and at least 2 : 1 compared to other dimensionality reduction techniques without significant loss of prediction accuracy. Also, improvements are demonstrated for our video-specific extensions modeling temporal structure and multiple modalities.

References

[1]
K. Barnard, P. Duygulu, D. Forsyth, N. D. Freitas, D. M. Blei, J. K, T. Hofmann, T. Poggio, and J. Shawe-taylor. Matching Words and Pictures. J. Machine Learning Research, 3:1107--1135, 2003.
[2]
H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-Up Robust Features. CVIU, 110(3):346--359, 2008.
[3]
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet Allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[4]
D. Borth, A. Ulges, C. Schulze, and T. Breuel. Keyframe Extraction for Video Tagging & Summarization. In GI-Informatiktage, pages 45--48, 2008.
[5]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.
[6]
A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statistical Society, Series B, 39(1):1--38, 1977.
[7]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, 2000.
[8]
R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning Object Categories from Google's Image Search. In ICCV, pages 1816--1823, 2005.
[9]
R. Fergus, P. Perona, and A. Zisserman. Object Class Recognition by Unsupervised Scale-invariant Learning. In CVPR, pages 264--271, 2003.
[10]
A. Gruber, M. Rosen-Zvi, and Y. Weiss. Hidden Topic Markov Models. In AISTATS, 2007.
[11]
A. Hanjalic, R. Lienhart, W. Ma, and J. Smith. The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away? Proc. IEEE, 96(4):541--547, 2008.
[12]
T. Hofmann. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42:177--196, 2001.
[13]
L. Hohl, F. Souvannavong, B. Mérialdo, and B. Huet. Enhancing Latent Semantic Analysis Video Object Retrieval with Structural Information. In ICIP, pages 1609--1612, 2004.
[14]
E. Hörster and R. Lienhart. Fusing Local Image Descriptors for Large-Scale Image Retrieval. CVPR, pages 1--8, 2007.
[15]
E. Hörster and R. Lienhart. Deep Networks for Image Retrieval on Large-scale Databases. In ACM MM, pages 643--646, 2008.
[16]
E. Hörster, R. Lienhart, and M. Slaney. Image Retrieval on Large-scale Image Databases. In CIVR,pages 17--24, 2007.
[17]
E. Hörster, R. Lienhart, and M. Slaney. Continuous Visual Vocabulary Models for PLSA-based SceneRecognition. In CIVR, pages 319--328, 2008.
[18]
C. S. Inc. Cisco Visual Networking Index: Forecast and Methodology, 2008--2013. available from http://www.cisco.com (retrieved: June'09).
[19]
E. Kasutani and A. Yamada. The MPEG-7 Color Layout Descriptor: A Compact Image Feature Description for High-speed Image/Video Segment Retrieval. Image Processing, 2001, 1:674--677, 2001.
[20]
W. Kraaij. TRECVID-2008 Content-based Copy Detection Task: Overview. In TRECVID Workshop (available from: http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html), 2008.
[21]
F.-F. Li and P. Perona. A Bayesian Hierarchical Model for Learning Natural Scene Categories. In CVPR, pages 524--531, 2005.
[22]
D. Liu and T. Chen. Unsupervised Image Categorization and Object Localization using Topic Models and Correspondences between Images. ICCV, pages 1--7, 2007.
[23]
J. Magalh aes and S. Rüger. Information-theoretic Semantic Multimedia Indexing. In CIVR, pages 619--626, 2007.
[24]
F. Monay and D. Gatica-Perez. PLSA-based Image Auto-annotation: Constraining the Latent Space. In ACM MM, pages 348--351, 2004.
[25]
J. C. Niebles, H. Wang, and L. Fei-fei. Unsupervised Learning of Human Action Categories using Spatial-temporal Words. In BMVC, 2006.
[26]
P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars. A Thousand Words in a Scene. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(9):1575--1589, 2007.
[27]
L. R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proc. IEEE, pages 257--286, 1989.
[28]
J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering Object Categories in Image Collections. In ICCV, 2005.
[29]
A. Smeulders, M. Worring, S. Santini, and A. G. R. Jain. Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000.
[30]
C. Snoek and M. Worring. Concept-based Video Retrieval. Foundations and Trends in Information Retrieval, 4(2):215--322, 2009.
[31]
F. Souvannavong, L. Hohl, B. Merialdo, and B. Huet. Structurally Enhanced Latent Semantic Analysis for Video Object Retrieval. IEEE Proc. Vision, Image and Signal Processing, Volume 152, No. 6, 2005.
[32]
F. Souvannavong, B. Mérialdo, and B. Huet. Latent Semantic Analysis for an Effective Region-based Video Shot Retrieval System. In MIR, pages 243--250, 2004.
[33]
F. Souvannavong, B. Mérialdo, and B. Huet. Latent Semantic Indexing for Semantic Content Detection of Video Shots. In ICME, pages 1783--1786, 2004.
[34]
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. J. American Statistical Association, 101(476):1566--1581, 2006.
[35]
P. Tirilly, V. Claveau, and P. Gros. Language Modeling for Bag-of-visual Words Image Categorization. In CIVR, pages 249--258, 2008.
[36]
K. E. van de Sande, T. Gevers, and C. G. Snoek. A Comparison of Color Features for Visual Concept Classification. In CIVR, pages 141--150, 2008.
[37]
Y. Wu, E. Chang, K. Chang, and J. Smith. Optimal Multimodal Fusion for Multimedia Data Analysis. In ACM Multimedia, pages 572--579, New York, NY, USA, 2004.
[38]
B. Yang, T. Mei, X. Hua, L. Yang, S. Yang, and M. Li. Online Video Recommendation Based on Multimodal Fusion and Relevance Feedback. In CIVR, pages 73--80, 2007.
[39]
Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis, 2001.

Cited By

View all
  • (2020)Spatio-Temporal Ranked-Attention Networks for Video Captioning2020 IEEE Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV45572.2020.9093291(1606-1615)Online publication date: Mar-2020
  • (2014)Multimedia Topic Models Considering Burstiness of Local FeaturesIEICE Transactions on Information and Systems10.1587/transinf.E97.D.714E97.D:4(714-720)Online publication date: 2014
  • (2013)Translating related words to videos and back through latent topicsProceedings of the sixth ACM international conference on Web search and data mining10.1145/2433396.2433456(485-494)Online publication date: 4-Feb-2013
  • Show More Cited By

Index Terms

  1. Topic models for semantics-preserving video compression

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MIR '10: Proceedings of the international conference on Multimedia information retrieval
    March 2010
    600 pages
    ISBN:9781605588155
    DOI:10.1145/1743384

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 March 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. content-based video retrieval
    2. topic models

    Qualifiers

    • Poster

    Conference

    MIR '10
    Sponsor:
    MIR '10: International Conference on Multimedia Information Retrieval
    March 29 - 31, 2010
    Pennsylvania, Philadelphia, USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Spatio-Temporal Ranked-Attention Networks for Video Captioning2020 IEEE Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV45572.2020.9093291(1606-1615)Online publication date: Mar-2020
    • (2014)Multimedia Topic Models Considering Burstiness of Local FeaturesIEICE Transactions on Information and Systems10.1587/transinf.E97.D.714E97.D:4(714-720)Online publication date: 2014
    • (2013)Translating related words to videos and back through latent topicsProceedings of the sixth ACM international conference on Web search and data mining10.1145/2433396.2433456(485-494)Online publication date: 4-Feb-2013
    • (2013)A Thousand Frames in Just a Few WordsProceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2013.340(2634-2641)Online publication date: 23-Jun-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media