poster

Topic models for semantics-preserving video compression

Authors:

Christoph H. Lampert,

Thomas M. BreuelAuthors Info & Claims

MIR '10: Proceedings of the international conference on Multimedia information retrieval

Pages 275 - 284

https://doi.org/10.1145/1743384.1743433

Published: 29 March 2010 Publication History

Abstract

Most state-of-the-art systems for content-based video understanding tasks require video content to be represented as collections of many low-level descriptors, e.g. as histograms of the color, texture or motion in local image regions. In order to preserve as much of the information contained in the original video as possible, these representations are typically high-dimensional, which conflicts with the aim for compact descriptors that would allow better efficiency and lower storage requirements.

In this paper, we address the problem of semantic compression of video, i.e. the reduction of low-level descriptors to a small number of dimensions while preserving most of the semantic information. For this, we adapt topic models - which have previously been used as compact representations of still images - to take into account the temporal structure of a video, as well as multi-modal components such as motion information.

Experiments on a large-scale collection of YouTube videos show that we can achieve a compression ratio of 20 : 1 compared to ordinary histogram representations and at least 2 : 1 compared to other dimensionality reduction techniques without significant loss of prediction accuracy. Also, improvements are demonstrated for our video-specific extensions modeling temporal structure and multiple modalities.

References

[1]

K. Barnard, P. Duygulu, D. Forsyth, N. D. Freitas, D. M. Blei, J. K, T. Hofmann, T. Poggio, and J. Shawe-taylor. Matching Words and Pictures. J. Machine Learning Research, 3:1107--1135, 2003.

Digital Library

[2]

H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-Up Robust Features. CVIU, 110(3):346--359, 2008.

Digital Library

[3]

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet Allocation. J. Mach. Learn. Res., 3:993--1022, 2003.

Digital Library

[4]

D. Borth, A. Ulges, C. Schulze, and T. Breuel. Keyframe Extraction for Video Tagging & Summarization. In GI-Informatiktage, pages 45--48, 2008.

[5]

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.

[6]

A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statistical Society, Series B, 39(1):1--38, 1977.

[7]

R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, 2000.

Digital Library

[8]

R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning Object Categories from Google's Image Search. In ICCV, pages 1816--1823, 2005.

Digital Library

[9]

R. Fergus, P. Perona, and A. Zisserman. Object Class Recognition by Unsupervised Scale-invariant Learning. In CVPR, pages 264--271, 2003.

[10]

A. Gruber, M. Rosen-Zvi, and Y. Weiss. Hidden Topic Markov Models. In AISTATS, 2007.

[11]

A. Hanjalic, R. Lienhart, W. Ma, and J. Smith. The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away? Proc. IEEE, 96(4):541--547, 2008.

[12]

T. Hofmann. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42:177--196, 2001.

Digital Library

[13]

L. Hohl, F. Souvannavong, B. Mérialdo, and B. Huet. Enhancing Latent Semantic Analysis Video Object Retrieval with Structural Information. In ICIP, pages 1609--1612, 2004.

[14]

E. Hörster and R. Lienhart. Fusing Local Image Descriptors for Large-Scale Image Retrieval. CVPR, pages 1--8, 2007.

[15]

E. Hörster and R. Lienhart. Deep Networks for Image Retrieval on Large-scale Databases. In ACM MM, pages 643--646, 2008.

Digital Library

[16]

E. Hörster, R. Lienhart, and M. Slaney. Image Retrieval on Large-scale Image Databases. In CIVR,pages 17--24, 2007.

Digital Library

[17]

E. Hörster, R. Lienhart, and M. Slaney. Continuous Visual Vocabulary Models for PLSA-based SceneRecognition. In CIVR, pages 319--328, 2008.

Digital Library

[18]

C. S. Inc. Cisco Visual Networking Index: Forecast and Methodology, 2008--2013. available from http://www.cisco.com (retrieved: June'09).

[19]

E. Kasutani and A. Yamada. The MPEG-7 Color Layout Descriptor: A Compact Image Feature Description for High-speed Image/Video Segment Retrieval. Image Processing, 2001, 1:674--677, 2001.

[20]

W. Kraaij. TRECVID-2008 Content-based Copy Detection Task: Overview. In TRECVID Workshop (available from: http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html), 2008.

[21]

F.-F. Li and P. Perona. A Bayesian Hierarchical Model for Learning Natural Scene Categories. In CVPR, pages 524--531, 2005.

Digital Library

[22]

D. Liu and T. Chen. Unsupervised Image Categorization and Object Localization using Topic Models and Correspondences between Images. ICCV, pages 1--7, 2007.

[23]

J. Magalh aes and S. Rüger. Information-theoretic Semantic Multimedia Indexing. In CIVR, pages 619--626, 2007.

Digital Library

[24]

F. Monay and D. Gatica-Perez. PLSA-based Image Auto-annotation: Constraining the Latent Space. In ACM MM, pages 348--351, 2004.

Digital Library

[25]

J. C. Niebles, H. Wang, and L. Fei-fei. Unsupervised Learning of Human Action Categories using Spatial-temporal Words. In BMVC, 2006.

[26]

P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars. A Thousand Words in a Scene. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(9):1575--1589, 2007.

Digital Library

[27]

L. R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proc. IEEE, pages 257--286, 1989.

[28]

J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering Object Categories in Image Collections. In ICCV, 2005.

[29]

A. Smeulders, M. Worring, S. Santini, and A. G. R. Jain. Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000.

Digital Library

[30]

C. Snoek and M. Worring. Concept-based Video Retrieval. Foundations and Trends in Information Retrieval, 4(2):215--322, 2009.

Digital Library

[31]

F. Souvannavong, L. Hohl, B. Merialdo, and B. Huet. Structurally Enhanced Latent Semantic Analysis for Video Object Retrieval. IEEE Proc. Vision, Image and Signal Processing, Volume 152, No. 6, 2005.

[32]

F. Souvannavong, B. Mérialdo, and B. Huet. Latent Semantic Analysis for an Effective Region-based Video Shot Retrieval System. In MIR, pages 243--250, 2004.

Digital Library

[33]

F. Souvannavong, B. Mérialdo, and B. Huet. Latent Semantic Indexing for Semantic Content Detection of Video Shots. In ICME, pages 1783--1786, 2004.

[34]

Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. J. American Statistical Association, 101(476):1566--1581, 2006.

[35]

P. Tirilly, V. Claveau, and P. Gros. Language Modeling for Bag-of-visual Words Image Categorization. In CIVR, pages 249--258, 2008.

Digital Library

[36]

K. E. van de Sande, T. Gevers, and C. G. Snoek. A Comparison of Color Features for Visual Concept Classification. In CIVR, pages 141--150, 2008.

Digital Library

[37]

Y. Wu, E. Chang, K. Chang, and J. Smith. Optimal Multimodal Fusion for Multimedia Data Analysis. In ACM Multimedia, pages 572--579, New York, NY, USA, 2004.

Digital Library

[38]

B. Yang, T. Mei, X. Hua, L. Yang, S. Yang, and M. Li. Online Video Recommendation Based on Multimodal Fusion and Relevance Feedback. In CIVR, pages 73--80, 2007.

Digital Library

[39]

Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis, 2001.

Cited By

Cherian AWang JHori CMarks T(2020)Spatio-Temporal Ranked-Attention Networks for Video Captioning2020 IEEE Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV45572.2020.9093291(1606-1615)Online publication date: Mar-2020
https://doi.org/10.1109/WACV45572.2020.9093291
XIE YEGUCHI K(2014)Multimedia Topic Models Considering Burstiness of Local FeaturesIEICE Transactions on Information and Systems10.1587/transinf.E97.D.714E97.D:4(714-720)Online publication date: 2014
https://doi.org/10.1587/transinf.E97.D.714
Das PSrihari RCorso JLeonardi SPanconesi AFerragina PGionis A(2013)Translating related words to videos and back through latent topicsProceedings of the sixth ACM international conference on Web search and data mining10.1145/2433396.2433456(485-494)Online publication date: 4-Feb-2013
https://dl.acm.org/doi/10.1145/2433396.2433456
Show More Cited By

Index Terms

Topic models for semantics-preserving video compression
1. Information systems
  1. Information retrieval

Recommendations

Content-based video retrieval and compression: a unified solution
ICIP '97: Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1

Video compression and retrieval have been treated as separate problems in the past. We present an object-based video representation that facilitates both compression and retrieval. Typically in retrieval applications, a video sequence is subdivided in ...
Extracting moving / static objects of interest in video
PCM'06: Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing

Extracting objects of interest in video is a challenging task that can improve the performance of video compression and retrieval. Usually moving objects in video were considered as objects of interest, so there were many researches to extract them. ...
Saliency-preserving video compression
ICME '11: Proceedings of the 2011 IEEE International Conference on Multimedia and Expo

In region-of-interest (ROI) video coding, the part of the frame designated as ROI is encoded with higher quality relative to the rest of the frame. At low bit rates, coding artifacts in non ROI parts of the frame may become salient and draw user's ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '10: Proceedings of the international conference on Multimedia information retrieval

March 2010

600 pages

ISBN:9781605588155

DOI:10.1145/1743384

General Chairs:
James Z. Wang
The Pennsylvania State University, USA
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Nuria Oliver Ramirez
Telefonica Research, Spain
,
Apostol Natsev
IBM Research, USA

Copyright © 2010 Copyright is held by the author/owner(s).

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

MIR '10

Sponsor:

SIGMM

MIR '10: International Conference on Multimedia Information Retrieval

March 29 - 31, 2010

Pennsylvania, Philadelphia, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cherian AWang JHori CMarks T(2020)Spatio-Temporal Ranked-Attention Networks for Video Captioning2020 IEEE Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV45572.2020.9093291(1606-1615)Online publication date: Mar-2020
https://doi.org/10.1109/WACV45572.2020.9093291
XIE YEGUCHI K(2014)Multimedia Topic Models Considering Burstiness of Local FeaturesIEICE Transactions on Information and Systems10.1587/transinf.E97.D.714E97.D:4(714-720)Online publication date: 2014
https://doi.org/10.1587/transinf.E97.D.714
Das PSrihari RCorso JLeonardi SPanconesi AFerragina PGionis A(2013)Translating related words to videos and back through latent topicsProceedings of the sixth ACM international conference on Web search and data mining10.1145/2433396.2433456(485-494)Online publication date: 4-Feb-2013
https://dl.acm.org/doi/10.1145/2433396.2433456
Das PXu CDoell RCorso J(2013)A Thousand Frames in Just a Few WordsProceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2013.340(2634-2641)Online publication date: 23-Jun-2013
https://dl.acm.org/doi/10.1109/CVPR.2013.340

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents