Abstract
The main idea of an interactive search is to gradually improve search quality of retrieval system via user interaction. While a large amount of work has been made in the past, most of the existing approaches typically require labeling effort for updating the query model. Unfortunately, it is time-consuming and tedious to label a large number of training examples. We aim to develop a novel text-driven cooperative learning scheme, which can offer users a quite natural query fashion and alleviate significantly the burden on users without compromising search performance. Starting with an advanced text-driven video search engine, a multi-view cooperative training strategy is proposed for learning from feedback data a refined ranking function. The main merit of proposed framework is its ability in mining training samples automatically from previous answer set and implicitly combining multiple modalities for effectively learning users’ query intent. Evaluation on TRECVID’ 06 video corpus shows that the proposed scheme with few training seeds achieves a comparable performance with classic interactive schemes.
Similar content being viewed by others
References
Lew, M. S., Sebe, N., Djeraba, C., & Jain, R. (2006). Content-based multimedia information retrieval State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1–19 (2006). TOMCCAP doi:10.1145/1126004.1126005
Amir, A., Argillander, J., Campbell, M., Haubold, A., Iyengar, G., Ebadollahi, S., et al: J. Teˇsi’c, and T. Volkmer, “IBM Research TRECVID-2005 Video Retrieval System,” In TREC Video Retrieval Evaluation Online Proceedings, TRECVID, Gaithersburg, USA, 2005.
Chang, S. F., Hsu, W. H., Kennedy, L., Xie, L., Yanagawa, A., Zavesky, E., et al: Columbia University TRECVID-2005 video search and high-level feature extraction. In TREC Video Retrieval Evaluation Online Proceedings, TRECVID, Gaithersburg, USA, 2005.
Kacprzyk, J., & Zadrozny, S. (2005). Linguistic database summaries and their protoforms: towards natural language based knowledge discovery tools. Information Science, 173, 281–304. doi:10.1016/j.ins.2005.03.002.
Snoek, C. G. M., van Gemert, J. C., Geusebroek, J. M., Huurnink, B., Koelma, D. C., Nguyen, G. P., et al (2005) The MediaMill TRECVID 2005 semantic video search engine. In TREC Video Retrieval Evaluation Online Proceedings, TRECVID, Gaithersburg, USA.
Snoek, C., Worring, M., Koelma, D., & Smeulders, A. (2006). Learned lexicon-driven interactive video retrieval. In CIVR 2006, pp. 11–20.
Zhang, D. S., & Nunamaker, J. F. (2004). A natural language approach to content-based video indexing and retrieval for interactive E-learning. IEEE Transaction on Multimedia, 6(3), 450–458.
Zhou, X. S., & Huang, T. S. (2002). Relevance feedback in content-based image retrieval: some recent advances. Information Science, 148, 129–137. doi:10.1016/S0020-0255(02)00286-4.
Hsu, W. H., Kennedy, L. S., & Chang, S.-F. (2007). Reranking methods for visual search. IEEE Transaction on Multimedia, 14, 14–22.
Yan, R., & Hauptmann, A. G. (2005). Co-retrieval: a boosted reranking approach for video retrieval. IEE Proceedings Vision, Image and Signal Processing, 152, 888–895. doi:10.1049/ip-vis:20045188.
Muneesawang, P., & Guan, L. (2002). Automatic machine interactions for content-based image retrieval using a self-organizing tree map architecture. IEEE Transactions on Neural Networks, 13(4), 821–834. doi:10.1109/TNN.2002.1021883.
Hauptmann, A. G., et al. (2005). CMU Informedia’s TRECVID 2005 Skirmishes. In TREC video retrieval evaluation online proceedings, TRECVID, Gaithersburg, USA.
Natsev, A., Naphade, M. R., & Tesic, J. (2005). Learning the semantic of multimedia queries and concepts from a small number of examples. In International Conference on Multimedia, ACM, Singapore, pp. 598–607.
Yuan, J. H., Zheng, W. J., Chen, L., Ding, D. Y., Wang, D., Tong, Z. J., et al. (2005). Tsinghua University at TRECVID 2005. In TREC video retrieval evaluation online proceedings, TRECVID, Gaithersburg, USA.
Kennedy, L. S., Natsev, A., & Chang, S. F. (2005). Automatic discovery of query-class-dependent models for multimodal search. In International Conference on Multimedia, ACM, Singapore, pp. 882–891.
Hsu, W. H., Kennedy, L. S., & Chang, S.-F. (2006). Video search reranking via information bottleneck principle. In 14th annual ACM international conference on Multimedia, Santa Barbara, CA, USA, pp. 35–44.
Porter, M. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Lafferty, J., & Zhai, C. (2001). Risk minimization and language modeling in information retrieval,” In 24th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01).
The Lemur Toolkit for Language Modeling and Information Retrieval: URL:http://www.lemurproject.org.
TRECVID, TREC Video Retrieval Evaluation.: In http://www-nlpir.nist.gov/projects/trecvid.
Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the Workshop on Computational Learning Theory, ACM, New York, USA, pp. 92–100.
Brefeld, U., & Scheffer, T. (2004). Co-EM support vector learning. In Proceedings of the twenty-first International Conference on Machine learning, Canada.
Nigam, K., & Ghani, R. (2000). Understanding the behavior of co-training. In Proceedings of the Workshop on Text Mining, ACM.
Su, H. J., Zhao, Y., & Yuan, B. Z. (2002). A new composite histogram integrating each bin’s spatial distribution for image retrieval,” In IEEE TENCON’02.
Petersohn, C. (2004). Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System”, In TREC Video Retrieval Evaluation OnlineProceedings, TRECVID, URL: http://www.nlpir.nist.gov/projects/tvpubs/tvpapers04/fraunhofer.pdf
Lexicon Definitions, L.S.C.O.M.: and Annotations Version1.0, DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia, Columbia University ADVENT Technical Report #217-2006-3, March 2006.
Vapnik, V. (2000) The nature of statistical learning theory. Tsinghua University Press, Chinese Language Edition.
Chang, C. C., & Lin, C. J. (2001). LIBSVM: a library for support vector machines,” 2001, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Yan, R., & Naphade, M. (2005). Multi-modal video concept extraction using co-training. In International Conference on Multimedia and Expo, IEEE, pp. 514–517.
Snoek, C. G. M., & Worring, M. (2005). Multimodal video indexing: a review of the state-of-the-art. In multimedia tools and applications, 2005 Springer Science + Business Media, Netherlands, pp. 5–35.
Chua, T.-S., Neo, S.-Y., Li, K.-Y., Wang, G., Shi, R., Zhao, M., et al (2004). TRECVID 2004 search and feature extraction task by NUS PRIS. In TREC Video Retrieval Evaluation Online Proceedings, TRECVID, Gaithersburg, USA.
Acknowledgments
This work was supported in part by National Science Foundation of China (No. 60602030, No. 90604032), 973 Program (No. 2006B30314), 863 Program (No. 2007AA01Z175), PCSIRT (No. IRT0707), and Specialized Research Foundation of BJTU (No. 2005SM013, No. 2005SZ005).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, S., Zhao, Y., Zhu, Z. et al. A Cooperative Learning Scheme for Interactive Video Search. J Sign Process Syst Sign Image Video Technol 59, 189–199 (2010). https://doi.org/10.1007/s11265-008-0287-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0287-2