tutorial

Video Indexing, Search, Detection, and Description with Focus on TRECVID

Authors:
George Awad

National Institute of Standards and Technology, Gaithersburg, MD, USA

National Institute of Standards and Technology, Gaithersburg, MD, USA
View Profile

,
Duy-Dinh Le

Univ. of Information Technology, Vietnam National University HCMC, Ho Chi Minh City, Vietnam

Univ. of Information Technology, Vietnam National University HCMC, Ho Chi Minh City, Vietnam
View Profile

,
Chong-Wah Ngo

City University of Hong Kong, Hong Kong, China

City University of Hong Kong, Hong Kong, China
View Profile

,
Vinh-Tiep Nguyen

Univ. of Science, Vietnam & National University HCMC, Ho Chi Minh City, Vietnam

Univ. of Science, Vietnam & National University HCMC, Ho Chi Minh City, Vietnam
View Profile

,
Georges Quénot

Univ. Grenoble Alpes CNRS, Grenoble INP, LIG, F-38000 Grenoble, France

Univ. Grenoble Alpes CNRS, Grenoble INP, LIG, F-38000 Grenoble, France
View Profile

,
Cees Snoek

Univ. of Amsterdam, Amsterdam, Netherlands

Univ. of Amsterdam, Amsterdam, Netherlands
View Profile

,
Shin'ichi Satoh

National Institute of Informatics, Tokyo, Japan

National Institute of Informatics, Tokyo, Japan
View Profile

ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia RetrievalJune 2017Pages 3–4https://doi.org/10.1145/3078971.3079044

Published:06 June 2017Publication History

ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

Pages 3–4

ABSTRACT

There has been a tremendous growth in video data the last decade. People are using mobile phones and tablets to take, share or watch videos more than ever before. Video cameras are around us almost everywhere in the public domain (e.g. stores, streets, public facilities, ...etc). Efficient and effective retrieval methods are critically needed in different applications. The goal of TRECVID is to encourage research in content-based video retrieval by providing large test collections, uniform scoring procedures, and a forum for organizations interested in comparing their results. In this tutorial, we present and discuss some of the most important and fundamental content-based video retrieval problems such as recognizing predefined visual concepts, searching in videos for complex ad-hoc user queries, searching by image/video examples in a video dataset to retrieve specific objects, persons, or locations, detecting events, and finally bridging the gap between vision and language by looking into how can systems automatically describe videos in a natural language. A review of the state of the art, current challenges, and future directions along with pointers to useful resources will be presented by different regular TRECVID participating teams. Each team will present one of the following tasks:

Semantic INdexing (SIN)
Zero-example (0Ex) Video Search (AVS)
Instance Search (INS)
Multimedia Event Detection (MED)
Video to Text (VTT)

References

George Awad, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, Alan F Smeaton, Georges Quénot, Maria Eskevich, Robin Aly, and Roeland Ordelman. 2016. Trecvid 2016: Evaluating video search, video event detection, localization, and hyperlinking. In Proceedings of TRECVID, Vol. 2016.Google Scholar
George Awad, Wessel Kraaij, Paul Over, and Shinâichi Satoh. 2017. Instance search retrospective with focus on TRECVID. International Journal of Multimedia Information Retrieval 6, 1 (2017), 1--29.Google ScholarCross Ref
George Awad, Cees GM Snoek, Alan F Smeaton, and Georges Quénot. 2016. TRECVid Semantic Indexing of Video: A 6-Year Retrospective. ITE Transactions on Media Technology and Applications 4, 3 (2016), 187--208.Google ScholarCross Ref
Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, Denis Pellerin, and Georges Quénot. 2016. Learned features versus engineered features for multimedia indexing. Multimedia Tools and Applications (2016), 1--18. Google ScholarDigital Library
Jianfeng Dong, Xirong Li, Weiyu Lan, Yujia Huo, and Cees GM Snoek. 2016. Early Embedding and Late Reranking for Video Captioning. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 1082--1086. Google ScholarDigital Library
Jianfeng Dong, Xirong Li, and Cees GM Snoek. 2016. Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. In ArXive.Google Scholar
Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2014. Videostory: A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 17--26. Google ScholarDigital Library
Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2015. Discovering semantic vocabularies for cross-media retrieval. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 131--138. Google ScholarDigital Library
Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2017. Video2vec Embeddings Recognize Events when Examples are Scarce. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).Google ScholarDigital Library
Amirhossein Habibian and Cees GM Snoek. 2014. Recommendations for rec- ognizing video events by concept vocabularies. Computer Vision and Image Understanding 124 (2014), 110--122.Google ScholarCross Ref
Duy-Dinh Le, S. Phan, V. Nguyen, C. Zhu, D. M. Nguyen, T. D. Ngo, S. Kasamwat- tanarote, P. Sebastien, M. Tran, D. A. Duong, and Shin'ichi Satoh. 2014. National Institute of Informatics, Japan at TRECVID 2014. In TRECVID.Google Scholar
Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept- Based Interactive Search System. In International Conference on Multimedia Modeling. Springer, 463--468.Google Scholar
Yi-Jie Lu, Hao Zhang, Maaike de Boer, and Chong-Wah Ngo. 2016. Event detec- tion with zero example: select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 127--134. Google ScholarDigital Library
Masoud Mazloom, Efstratios Gavves, and Cees GM Snoek. 2014. Conceptlets: Selective semantics for classifying video events. IEEE Transactions on Multimedia 16, 8 (2014), 2214--2228.Google ScholarCross Ref
Masoud Mazloom, Xirong Li, and Cees GM Snoek. 2016. Tagbook: A semantic video representation without supervision for event detection. IEEE Transactions on Multimedia 18, 7 (2016), 1378--1388. Google ScholarDigital Library
Pascal Mettes, Dennis C Koelma, and Cees GM Snoek. 2016. The imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 175--182. Google ScholarDigital Library
Xiao-Yong Wei, Yu-Gang Jiang, and Chong-Wah Ngo. 2011. Concept-driven multi-modality fusion for video search. IEEE Transactions on Circuits and Systems for Video Technology 21, 1 (2011), 62--73. Google ScholarDigital Library
Hao Zhang, Yi-Jie Lu, Maaike de Boer, Frank ter Haar, Zhaofan Qiu, Klamer Schutte, Wessel Kraaij, and Chong-Wah Ngo. 2015. VIREO-TNO@ TRECVID 2015: multimedia event detection. In Proc. of TRECVID .Google Scholar
Cai-Zhi Zhu, Hervé Jégou, and Shin Ichi Satoh. 2013. Query-adaptive asym- metrical dissimilarities for visual object retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 1705--1712. Google ScholarDigital Library
Cai-Zhi Zhu and Shin'ichi Satoh. 2012. Large vocabulary quantization for search- ing instances from videos. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, 52. Google ScholarDigital Library

Index Terms

Video Indexing, Search, Detection, and Description with Focus on TRECVID
1. Information systems
  1. Information retrieval

Recommendations

Zero-Example Multimedia Event Detection and Recounting with Unsupervised Evidence Localization
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Retrieval of a complex multimedia event has long been regarded as a challenging task. Multimedia event recounting, other than event detection, focuses on providing comprehensible evidence which justifies a detection result. Recounting enables "video ...
Read More
n-gram Models for Video Semantic Indexing
MM '14: Proceedings of the 22nd ACM international conference on Multimedia

We propose n-gram modeling of shot sequences for video semantic indexing, in which semantic concepts are extracted from a video shot. Most previous studies for this task have assumed that video shots in a video clip are independent from each other. We ...
Read More
Large vocabulary quantization for searching instances from videos
ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

A very promising application involving video collections is to search for relevant video segments from a video database when given few visual examples of the specific instance, e.g. a person, object, or place. However, this problem is difficult due to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval
June 2017
524 pages
ISBN:9781450347013
DOI:10.1145/3078971
General Chairs:
Bogdan Ionescu
University Politehnica of Bucharest, Romania
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Jiashi Feng
National University of Singapore, Singapore
,
Martha Larson
Radboud University & Delft University of Technology, The Netherlands
,
Rainer Lienhart
University of Augsburg, Germany
,
Cees Snoek
University of Amsterdam & Qualcomm Research Netherlands, The Netherlands
Copyright © 2017 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2017
Check for updates
Author Tags
instance search
multimedia event detection
semantic indexing
trecvid
video description
video search
Qualifiers
- tutorial
Conference

Acceptance Rates
ICMR '17 Paper Acceptance Rate33of95submissions,35%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 224
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Video Indexing, Search, Detection, and Description with Focus on TRECVID

ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Zero-Example Multimedia Event Detection and Recounting with Unsupervised Evidence Localization

n-gram Models for Video Semantic Indexing

Large vocabulary quantization for searching instances from videos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media