ABSTRACT
There has been a tremendous growth in video data the last decade. People are using mobile phones and tablets to take, share or watch videos more than ever before. Video cameras are around us almost everywhere in the public domain (e.g. stores, streets, public facilities, ...etc). Efficient and effective retrieval methods are critically needed in different applications. The goal of TRECVID is to encourage research in content-based video retrieval by providing large test collections, uniform scoring procedures, and a forum for organizations interested in comparing their results. In this tutorial, we present and discuss some of the most important and fundamental content-based video retrieval problems such as recognizing predefined visual concepts, searching in videos for complex ad-hoc user queries, searching by image/video examples in a video dataset to retrieve specific objects, persons, or locations, detecting events, and finally bridging the gap between vision and language by looking into how can systems automatically describe videos in a natural language. A review of the state of the art, current challenges, and future directions along with pointers to useful resources will be presented by different regular TRECVID participating teams. Each team will present one of the following tasks:
Semantic INdexing (SIN)
Zero-example (0Ex) Video Search (AVS)
Instance Search (INS)
Multimedia Event Detection (MED)
Video to Text (VTT)
- George Awad, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, Alan F Smeaton, Georges Quénot, Maria Eskevich, Robin Aly, and Roeland Ordelman. 2016. Trecvid 2016: Evaluating video search, video event detection, localization, and hyperlinking. In Proceedings of TRECVID, Vol. 2016.Google Scholar
- George Awad, Wessel Kraaij, Paul Over, and Shinâichi Satoh. 2017. Instance search retrospective with focus on TRECVID. International Journal of Multimedia Information Retrieval 6, 1 (2017), 1--29.Google ScholarCross Ref
- George Awad, Cees GM Snoek, Alan F Smeaton, and Georges Quénot. 2016. TRECVid Semantic Indexing of Video: A 6-Year Retrospective. ITE Transactions on Media Technology and Applications 4, 3 (2016), 187--208.Google ScholarCross Ref
- Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, Denis Pellerin, and Georges Quénot. 2016. Learned features versus engineered features for multimedia indexing. Multimedia Tools and Applications (2016), 1--18. Google ScholarDigital Library
- Jianfeng Dong, Xirong Li, Weiyu Lan, Yujia Huo, and Cees GM Snoek. 2016. Early Embedding and Late Reranking for Video Captioning. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 1082--1086. Google ScholarDigital Library
- Jianfeng Dong, Xirong Li, and Cees GM Snoek. 2016. Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. In ArXive.Google Scholar
- Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2014. Videostory: A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 17--26. Google ScholarDigital Library
- Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2015. Discovering semantic vocabularies for cross-media retrieval. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 131--138. Google ScholarDigital Library
- Amirhossein Habibian, Thomas Mensink, and Cees GM Snoek. 2017. Video2vec Embeddings Recognize Events when Examples are Scarce. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).Google ScholarDigital Library
- Amirhossein Habibian and Cees GM Snoek. 2014. Recommendations for rec- ognizing video events by concept vocabularies. Computer Vision and Image Understanding 124 (2014), 110--122.Google ScholarCross Ref
- Duy-Dinh Le, S. Phan, V. Nguyen, C. Zhu, D. M. Nguyen, T. D. Ngo, S. Kasamwat- tanarote, P. Sebastien, M. Tran, D. A. Duong, and Shin'ichi Satoh. 2014. National Institute of Informatics, Japan at TRECVID 2014. In TRECVID.Google Scholar
- Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept- Based Interactive Search System. In International Conference on Multimedia Modeling. Springer, 463--468.Google Scholar
- Yi-Jie Lu, Hao Zhang, Maaike de Boer, and Chong-Wah Ngo. 2016. Event detec- tion with zero example: select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 127--134. Google ScholarDigital Library
- Masoud Mazloom, Efstratios Gavves, and Cees GM Snoek. 2014. Conceptlets: Selective semantics for classifying video events. IEEE Transactions on Multimedia 16, 8 (2014), 2214--2228.Google ScholarCross Ref
- Masoud Mazloom, Xirong Li, and Cees GM Snoek. 2016. Tagbook: A semantic video representation without supervision for event detection. IEEE Transactions on Multimedia 18, 7 (2016), 1378--1388. Google ScholarDigital Library
- Pascal Mettes, Dennis C Koelma, and Cees GM Snoek. 2016. The imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 175--182. Google ScholarDigital Library
- Xiao-Yong Wei, Yu-Gang Jiang, and Chong-Wah Ngo. 2011. Concept-driven multi-modality fusion for video search. IEEE Transactions on Circuits and Systems for Video Technology 21, 1 (2011), 62--73. Google ScholarDigital Library
- Hao Zhang, Yi-Jie Lu, Maaike de Boer, Frank ter Haar, Zhaofan Qiu, Klamer Schutte, Wessel Kraaij, and Chong-Wah Ngo. 2015. VIREO-TNO@ TRECVID 2015: multimedia event detection. In Proc. of TRECVID .Google Scholar
- Cai-Zhi Zhu, Hervé Jégou, and Shin Ichi Satoh. 2013. Query-adaptive asym- metrical dissimilarities for visual object retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 1705--1712. Google ScholarDigital Library
- Cai-Zhi Zhu and Shin'ichi Satoh. 2012. Large vocabulary quantization for search- ing instances from videos. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, 52. Google ScholarDigital Library
Index Terms
- Video Indexing, Search, Detection, and Description with Focus on TRECVID
Recommendations
Zero-Example Multimedia Event Detection and Recounting with Unsupervised Evidence Localization
MM '16: Proceedings of the 24th ACM international conference on MultimediaRetrieval of a complex multimedia event has long been regarded as a challenging task. Multimedia event recounting, other than event detection, focuses on providing comprehensible evidence which justifies a detection result. Recounting enables "video ...
n-gram Models for Video Semantic Indexing
MM '14: Proceedings of the 22nd ACM international conference on MultimediaWe propose n-gram modeling of shot sequences for video semantic indexing, in which semantic concepts are extracted from a video shot. Most previous studies for this task have assumed that video shots in a video clip are independent from each other. We ...
Large vocabulary quantization for searching instances from videos
ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia RetrievalA very promising application involving video collections is to search for relevant video segments from a video database when given few visual examples of the specific instance, e.g. a person, object, or place. However, this problem is difficult due to ...
Comments