Abstract
Efficient and robust video copy detection is an important topic for many applications, such as commercial monitoring and social media retrieval. In this paper, with the aim of handling large-scale video data, we propose an efficient and robust video copy detection method jointly utilizing the characteristics of temporal continuity and multi-modality of video. The video is converted to a continuous sequence of states, and both the visual and auditory features are extracted for temporal frames. To facilitate tolerance of the length variations caused during video re-targeting, an efficient dynamic path search method is proposed to detect the target video clips, and highly compact audio fingerprint and visual ordinal features are jointly utilized in a flexible frame. The proposed scheme not only achieves high computational efficiency but also guarantees effectiveness in real applications. Comparison experiments were conducted using video commercials and real television programs from four channels as well as a benchmark video copy detection dataset, and the results demonstrate both the high efficiency and high robustness of the proposed method.
Similar content being viewed by others
References
Allamanche, E., Herre, J., Hellmuth, O., Bernhard, F.B., Cremer, M.: Audioid: towards content-based identification of audio material. In: Proceedings of the 100th AES Convention (2001)
Bhat, D.N., Nayar, S.K.: Ordinal measures for image correspondence. IEEE Trans. Pattern Anal. Mach. Intell. 20(4), 415–423 (1998)
Cox, I., Kilian, J., Leighton, F., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Trans. Image Process. 6(12), 1673–1687 (1997)
Ney, H., Mergel, D., Noll, A., Paesler, A.: Data-driven search organization for continuous speech recognition. IEEE Trans. Signal Process. 40(2), 272–281 (1992)
Haitsma, J.: A highly robust audio fingerprinting system. In: Proceedings of the International Symposium on Music Information Retrieval (2002)
Hartung, F., Kutter, M.: Multimedia watermarking techniques. Proc. IEEE Spec. Issue Identif. Prot. Multimed. Inf. 87(7), 1079C1107 (1999)
Hua, X.S., Chen, X., Zhang, H.J.: Robust video signature based on ordinal measure. In: Proceedings of International Conference on Image Processing (2004)
Jain, A.K., Vailaya, A., Wei, X.: Query by video clip. Multimed. Syst. 7(5), 369–384 (1999)
Jang, D., Lee, S., Lee, J.S., Jin, M., Seo, J.S., Lee, S., Yoo, C.D.: Automatic commercial monitoring for tv broadcasting using audio fingerprinting. In: Proceedings of the AES 29th International Conference (2006)
Law-to, J., Chen, L., Joly, A., Laptev, I., Buisson, O., Gouet-Brunet, V., Buoujemaa, N., Stentiford, F.: Video copy detection: a comparative study. In: Proceedings of CIVR (2007)
Liang, W., Zhang, S., Xu, B.: A histogram algorithm for fast audio retrieval. In: Proceedings of the 6th International Conference on Music, Information Retrieval, pp. 586–589 (2005)
Liang, Y., Cao, B., Li, J., et al.: Thu-img at trecvid at 2009. TRECVID Workshop at NIST (2009)
Liu, Y., Zhao, W.L., Ngo, C.W., Xu, C.S., Lu, H.Q.: Coherent bag-of audio words model for efficient large-scale video copy detection. In: Proceedings of the ACM International Conference on Image and Video Retrieval (2010)
Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the International Symposium on Music Information Retrieval (2000)
Naphade, M.R., Wang, R., Huang, T.S.: Supporting audiovisual query using dynamic programming. In: Proceedings of ACM Multimedia (2001)
Naphade, M.R., Yeung, M.M., Yeo, B.L.: A novel scheme for fast and efficient video sequence matching using compact signatures. In: SPIE Proceedings Storage and Retrieval for Multimedia Databases, pp. 564–572 (2000)
NIST: guidelines for TRECVid 2011. http://wwwnlpir.nist.gov/projects/tv2011/tv2011.html (online)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Seo, J.S., Jin, M., Lee, S., Jang, D., Lee, S., Yoo, C.D.: Audio fingerprinting based on normalized spectral subband centroids. In: Proceedings of International Conference of Audio Speech and Signal Processing, pp. 213–216 (2005)
Shivadas, A., Gauch, J.M.: Real-time commercial recognition using color moments and hashing. In: Proceedings of the Fourth Canadian Conference on Computer and Robot Vision, IEEE Computer Society, pp. 465–472 (2007)
Snchez, J.M., Binefa, X.: Shot partitioning based recognition of tv commercials. Multimed. Tools Appl. 18(3), 233–247 (2002)
Tian, Y., Jiang, M., Mou, L., Fang, X., Huang, T.: A multimodal video copy detection approach with sequential pyramid matching. In: Proceedings of IEEE International Conference on Image Processing, pp. 3629–3632 (2011)
Wang, J., Duan, L., Liu, Q., Lu, H., Jin, J.S.: Robust commercial retrieval in video streams. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 260–263 (2007)
Wiki: F1 score. http://en.wikipedia.org/wiki/F1_score (online)
Wu, X., Hauptmann, A.G., Ngo, C.W.: Practical elimination of near-duplicates from web video search. In: Proceedings of ACM Multimedia (2007)
Wu, X., Zhang, Y., Tang, S., Xia, T., Li, J.: A hierarchical scheme for rapid video copy detection. In: Proceedings of IEEE Conference on Applications of Computer Vision (2008)
Yuan, J., Duan, L.Y.: Fast and robust short video clip search using an index structure. In: Proceedings of the 6th ACM SIGMM International Workshop on Multimedia, Information Retrieval, pp. 61–68 (2004)
Zhang, H., Wang, A., Altunbasak, Y.: Content-based video retrieval and compression: a unified solution. In: Proceedings of the IEEE International Conference on Image Processing, vol. 1, pp. 13–16 (1997)
Zhao, D., Wang, X., Qian, Y., Liu, Q., Lin, S.: Fast commercial detection based on audio retrieval. In: Proceedings of IEEE International Conference on Multimedia and Expo (2008)
Zheng, G., Han, J.: Real-time audio retrieval method and automatic commercial detecting system. J. Comput. Sci. 2(3), 297–302 (2006)
Zhong, D., Chang, S.F.: Spatio-temporal video search using the object-based video representation. In: Proceedings of the IEEE International Conference on Image Processing, vol. 2, pp. 21–24 (1997)
Acknowledgments
This work is supported by the National Natural Science Foundation (NSF) of China (No. 61300056), the Ph.D. Programs Foundation of Ministry of Education of China (No. 20133401120005), the Anhui Provincial Natural Science Foundation of China (No. 1408085QF118) and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (No. 201306282).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, T., Nian, F., Wu, X. et al. Efficient video copy detection using multi-modality and dynamic path search. Multimedia Systems 22, 29–39 (2016). https://doi.org/10.1007/s00530-014-0387-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0387-8