Abstract
This paper describes methods for automatically associating faces detected from multimedia documents with their names presented in the surrounding metadata. We consider the task in the image matching (IM) framework, where external Web facial images are automatically retrieved as the gallery face set of the names in advance, and a detected face is assigned to one of the names, or none of them, according to the association score between the two kinds of faces and constraints. Several important issues are investigated within the IM framework. In collecting Web facial images, beyond the basic scheme that use a celebrity name purely as the query to crawl facial images, a context-assisted image search method is proposed to enhance the relevance and discriminability of the retrieved faces. In constraint formulation, we propose an assigning-thresholding (AT) pipeline to uniformly ensure that the name-face correspondence is strictly one-to-one, and set low confidence associations as null assignments. In association score computation, we propose methods that jointly consider IM with the well-established graph-based association (GA) method at different stages, aiming at producing more accurate scores to benefit the association. Based on these efforts, an Accu-IM method performing the association as accurate as possible and a Fast-IM method performing the association in real-time are respective proposed. Extensive experiments on datasets of captioned News images and Web videos both demonstrate the advantages of the proposed efforts individually and jointly, which consistently provide improvement gains under different settings when compared with state-of-the-art methods.
Similar content being viewed by others
Notes
In fact, his true name is Jan Kraus. He is recognized as Jana Krause since he is known as the host of a famous TV show named Jana Krause.
References
Zhang, X., Zhang, L., Wang, X. J., Shum, H. Y.: Finding celebrities in billions of web images. IEEE Trans. Multimedia 14(4), 995–1007 (2012)
Yao, T., Liu, Y., Ngo, C. W., Mei, T.: (2013) “Unified entity search in social media community,” International world wide web conference (WWW), pp. 1457–1466
Pang, L., Ngo, C. W.: Unsupervised celebrity face naming in web videos. IEEE Trans. Multimedia. 17(6), 854–866 (2015)
Wang, W., Zhang, D. M., Zhang, Y. D., Li, J. T.: Robust spatial matching for object retrieval and its parallel implementation on GPU. IEEE Trans. Multimedia 13(6), 1308–1318 (2011)
Zhang, W., Ngo, C.W.: “Searching visual instances with topology checking and context modeling”, ACM international conference on multimedia retrieval (ICMR), pp. 57–64 (2013)
Yao, T., Ngo, C.W., Mei, T.: Circular reranking for visual search. IEEE Trans. Image Process. 22(4), 1644–1655 (2013)
Zhang, Y. D., Zhang, L., Tian, Q.: A prior-free weighting scheme for binary code ranking. IEEE Trans. Multimedia 16(4), 1127–1139 (2014)
Pan, Y., Yao, T., Mei, T., Li, H., Ngo, C.W., Rui, Y.: “Click-through-based cross-view learning for Image Search,” ACM conference on research and development in information retrieval (SIGIR), pp. 717–726 (2014)
Zhang, W., Ngo, C. W.: Topological spatial verification for instance search. IEEE Trans. Multimedia. 17(8), 1236–1247 (2015)
Yao, T., Mei, T., Ngo, C.W.: “Learning query and image similarities with ranking canonical correlation analysis”, International conference on computer vision (ICCV), pp. 28–36 (2015)
Zhang, W., Li, H., Ngo, C. W., Chang, S.-F.: “Scalable visual instance mining with threads of features”, ACM International Conference on Multimedia, pp. 297–306 (2014)
Yao, T., Ngo, C. W., Mei, T.: “Context-based friend suggestion in online photo-sharing community,” ACM international conference on multimedia, pp. 945–948 (2011)
Cao, J., Ngo, C. W., Zhang, Y. D., Li, J. T.: Tracking web video topics: discovery, visualization, and monitoring. IEEE Trans. Circuits Syst. Video Technol. 21(12), 1835–1846 (2011)
Zhang, W., Ngo, C.W., Cao, X. C.: Hyperlink-aware object retrieval. IEEE Trans. Image Process. 25(9), 4186–4198 (2016)
Liu, N., Chen, J., Zhu, L., Zhang, J., He, Y.: A key management scheme for secure communications of advanced metering infrastructure in smart grid. IEEE Trans. Ind. Electron. 60(10), 4746–4756 (2013)
Zhao, W., Chellappa, R., Phillips, P. J., Rosenfeld, A.: Face recognition: a literature survey. ACM Comput. Surv. 35(4), 399–458 (2003)
Wright, J., Yang, A. Y., Ganesh, A., et al.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell 31(2), 210–227 (2008)
Satoh, S., Nakamura, Y., Kanade, T.: Name-it: naming and detecting faces in news videos. IEEE Multimedia. 6(1), 22–35 (1999)
Berg, T. L., Berg, A.C., Edwards, J., et al.: Names and faces in the news. In: IEEE CVPR, pp. 848–854 (2004)
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Face recognition from caption-based supervision. Int. J. Comput. Vis. 96(1), 64–82 (2012)
Bu, J., Xu, B., et al.: Unsupervised face-name association via commute distance. In: ACM multimedia, pp. 219–228 (2012)
Chen, Z.N., Ngo, C.W., Zhang, W., Cao, J., Jiang, Y.G.: Name-face association in web videos: a large-scale dataset, baselines, and open issues. J. Comput. Sci. Technol. 29(5), 785–798 (2014)
Poppe, R.: Facing scalability: naming faces in an online social network. Pattern Recognit. 45(6), 2335–2347 (2012)
Ozkan, D., Duygulu, P.: Interesting faces: a graph-based approach for finding people in news. Pattern Recognit. 43(5), 1717–1735 (2010)
Pham, P.T., Moens, M.F., Tuytelaars, T.: Cross-media alignment of names and faces. IEEE Trans. Multimedia. 12(1), 13–27 (2010)
Ozcan, M., Zurich, E.T.H. et al.: A large-scale database of images and captions for automatic face naming. In: BMVC, pp. 1–8 (2011)
Yang, J., Hauptmann, A.G.: Naming every individual in news video monologues. In: ACM multimedia, pp. 580–587 (2004)
Yang, J., Yan, R., Hauptmann, A.G.: Multiple instance learning for labeling faces in broadcasting news video,’’ ACM international conference on multimedia, pp. 31–40 (2005)
Pham, P.T., Tuytelaars, T., Moens, M.-F.: Naming people in news videos with label propagation. IEEE Multimedia. 18(3), 44–55 (2011)
Liu, C. X., Jiang, S. Q., Huang, Q.M.: “Naming faces in broadcast news video by Image Google”, ACM Int. Conf. Multimedia, pp. 717–720 (2008)
Zhang, Y.F., Xu, C.S., Lu, H.Q., et al.: Character identification in feature-length films using global face-name matching. IEEE Trans. Multimedia 11(7), 1276–1288 (2009)
Sang, J., Xu, C.S.: Robust face-name graph matching for movie character identification. IEEE Trans. Multimedia. 14(3), 586–596 (2012)
Gao, G.Y., Xu, M.D., Shen, J.J., Ma, H.D., Yan, S.C.: Cast2Face: assigning character names onto faces in movie with actor-character correspondence. IEEE Trans. Circuits Syst. Video Technol. 26(12), 2299–2312 (2015)
Zhang, Y., Tang, Z., Wu, B., et al.: A coupled hidden conditional random field model for simultaneous face clustering and naming in videos. IEEE Trans. Image Process. 25(12), 5780–5792 (2016)
Tapaswi, M., Bäuml, M., Stiefelhagen, R.: “Improved weak labels using contextual cues for person identification in videos,” 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–8 (2015)
Cinbis, R. G., Verbeek, J., Schmid, C.: Unsupervised metric learning for face identification in TV video. In: ICCV, pp. 1559–1566 (2011)
Everingham, M., Sivic, J., Zisserman, A.: Hello! My name is ... Buffy—automatic naming of characters in TV Video. In: BMVC, pp. 1–10 (2006)
Ramanan, D., Baker, S., Kakade, S.: Leveraging archival video for building face datasets. In: ICCV, pp. 1–8 (2007)
Bauml, M., Tapaswi, M., Stiefelhagen, R.: Semi-supervised learning with constraints for person identification in multimedia data. In: IEEE CVPR, pp. 3602–3609 (2013)
Pham, P. T., Deschacht, K., Tuytelaars, T., Moens, M. F.: Naming persons in video: using the weak supervision of textual stories. J. Vis. Commun. Image Represent. 24(7), 944–955 (2013)
Guillaumin, M., Verbeek, J., Schmid, C., Lear, I., Kuntzmann, L.: Is that you? Metric learning approaches for face identification. In: ICCV, pp. 498–505 (2009)
Le, D.D., Satoh, S.: Unsupervised face annotation by mining the web. In: ICDM, pp. 383–392 (2008)
Wang, D. Y., S. C. H. Hoi, He, Y.: Mining weakly labeled web facial images for search-based face annotation. In: ACM SIGIR, pp. 535–544 (2011)
Wang, D. Y., S. C. H. Hoi, He, Y., Zhu, J. K., Mei, T., Luo, J. B.: Retrieval-based face annotation by weak label regularized local coordinate coding. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 550–563 (2014)
Zhao, M., Yagnik, J., Adam, H., et al.: Large scale learning and recognition of faces in web videos. In: IEEE conf. automatic face and gesture recognition. IEEE Press, pp. 1–7 (2008)
Sargin, M. E., Aradhye, H., Moreno, P.J., Zhao, M.: Audiovisual celebrity recognition in unconstrained web videos”, IEEE ICASSP, pp. 1977–1980 (2009)
Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: CVPR, pp. 529–534 (2011)
Chen, Z.N., Ngo, C.W., Cao, J., Zhang, W.: Community as a connector: associating faces with celebrity names in web videos”, ACM international conference on multimedia, pp. 809–812 (2012)
Chen, Z. N., Feng, B. L., Ngo, C. W., Jia, C.Y., Huang, X.S.: Improving automatic name-face association using celebrity images on the web. In: ACM ICMR, pp. 623–626 (2015)
Stone, Z., Zickler, T., Darrell, T.: Toward large-scale face recognition using social network context. Proc. IEEE. 98(8), 1408–1415 (2010)
Chen, Z., Zhang, W., Xie, H., Feng, B., Gu, X.: Context-oriented name-face association in web videos,” Pacific-Rim conference on Multimedia, pp. 629–639 (2016)
Holub, A., Moreels, P., Perona, P.: “Unsupervised clustering for Google searches of celebrity images”, IEEE conf. automatic face and gesture recognition. IEEE Press, pp. 1–7 (2008)
Chen, Z. N., Cao, J., Song, Y. C., Zhang, Y. D., Li, J. T.: Web video categorization based on wikipedia categories and content-duplicate open resources”, ACM international conference on multimedia, pp. 1107–1110 (2010)
Zhao, W. L., Wu, X., Ngo, C.W.: On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimedia. 12(5), 448–461 (2010)
Guo, Y. D., Zhang, L., Hu, Y., Gao, J.F.: MS-Celeb-1M: challenge of recognizing one million celebrities in the real world”, European conference on computer vision, pp. 87–102 (2016)
Liu, L., Zhang, L., Liu, H., Yan, S.: Toward large-population face identification in unconstrained videos. IEEE Trans. Circuits Syst. Video Technol. 24(11), 1874–1884 (2014)
Chen, Z., Feng, B., Xie, H., Zheng, R., Xu, B.: “Video to article hyperlinking by multiple tag property exploration,” International conference on multimedia modeling, pp. 62–73 (2014)
Xie, H., Zhang, Y., Tan, J., Guo, L., Li, J.: Contextual query expansion for image retrieval. IEEE Trans. Multimedia. 16(4), 1104–1114 (2014)
Chen, Z. N., Cao, J., Xia, T., Song, Y. C., Zhang, Y. D., Li, J. T.: Web video retagging. Multimedia Tools Appl.. 55(1), 53–82 (2011)
Xie, H., Gao, K., Zhang, Y., Tang, S., Li, J., Liu, Y.: Efficient feature detection and effective post-verification for large scale near-duplicate image search. IEEE Trans. Multimedia. 13(6), 1319–1332 (2011)
Xie, H., Zhang, Y., Gao, K., Tang, S., Xu, K., Guo, L., Li, J.: Robust common visual pattern discovery using graph matching. J. Vis. Commun. Image Represent. 24(5), 635–646 (2013)
Xie, H., Gao, K., Zhang, Y., Li, J., Liu, Y.: “Pairwise weak geometric consistency for large scale image search,” ACM international conference on multimedia retrieval, pp. 42–49 (2011)
Xie, H., Gao, K., Zhang, Y., Li, J., Ren, H., “Common visual pattern discovery via graph matching,” ACM international conference on multimedia, pp. 1385–1388 (2011)
Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T. Boosting image captioning with attributes. arXiv:1611.01646. (2016)
Pan, Y., Yao, T., Mei, T., Li, H.: Video captioning with transferred semantic attributes. arXiv:1611.07675. (2016)
Acknowledgements
This work was supported by the National Nature Science Foundation of China (61303175, 61303171, 61602463).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, Z., Zhang, W., Deng, B. et al. Name-face association with web facial image supervision. Multimedia Systems 25, 1–20 (2019). https://doi.org/10.1007/s00530-017-0544-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-017-0544-y