Abstract
We study the problem of retrieving cartoon faces of celebrities given their real face as a query. We refer to this problem as Photo2Cartoon. The Photo2Cartoon problem is challenging since (i) cartoons vary excessively in style and (ii) modality gap between real and cartoon faces is large. To address these challenges, we present a discriminative deep metric learning approach designed for matching cross-modal faces and showcase Photo2Cartoon. The proposed approach learns a nonlinear transformation to project real and cartoon face pairs into a common subspace where distance between positive pairs becomes smaller as compared to distance between negative pairs. We evaluate our method on two public benchmarks, namely IIIT-CFW and Viewed Sketch, and show superior retrieval results as compared to related methods.






Notes
Cartoon is typically non-realistic or semi-realistic artistic style of drawing or painting, or an image or series of images intended for satire, caricature or humor [1].
References
Cartoon (from Wikipedia, the free encyclopedia). https://en.wikipedia.org/wiki/Cartoon. Accessed 2018-02-10
Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. CoRR. arXiv:1306.6709
Cao Q, Shen L, Xie W, Parkhi O.M, Zisserman A (2018) VGGFace2: a dataset for recognising faces across pose and age. In: FG
Crowley EJ, Parkhi OM, Zisserman A (2015) Face painting: querying art with photos. In: BMVC
Fan H, Cao Z, Jiang Y, Yin Q, Doudou C (2014) Learning deep face representation. CoRR. arXiv:1403.2802
Goldberger J, Hinton GE, Roweis ST, Salakhutdinov RR (2005) Neighbourhood components analysis. In: NIPS
Härdle WK, Simar L (2015) Canonical correlation analysis. In: Applied multivariate statistical analysis. Springer, pp 443–454
Hu P, Ramanan D (2017) Finding tiny faces. In: CVPR
Huang D, Wang Y.F (2013) Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In: ICCV
Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst
Huang X, Peng Y (2017) Cross-modal deep metric learning with multi-task regularization. arXiv preprint arXiv:1703.07026
Huo J, Gao Y, Shi Y, Yang W, Yin H (2016) Ensemble of sparse cross-modal metrics for heterogeneous face recognition. In: ACM-MM
Kang C, Liao S, He Y, Wang J, Niu W, Xiang S, Pan C (2015) Cross-modal similarity learning: a low rank bilinear formulation. In: CIKM
Kemelmacher-Shlizerman I, Seitz SM, Miller D, Brossard E (2016) The megaface benchmark: 1 million faces for recognition at scale. In: CVPR
Klare B, Jain AK (2013) Heterogeneous face recognition using kernel prototype similarities. IEEE Trans Pattern Anal Mach Intell 35(6):1410–1422
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, vol 2
Kumar N, Berg AC, Belhumeur PN, Nayar SK (2009) Attribute and simile classifiers for face verification. In: ICCV
Liong VE, Lu J, Tan YP, Zhou J (2017) Deep coupled metric learning for cross-modal matching. IEEE Trans Multimed 19(6):1234–1244
Martinez AM (1998) The AR face database. CVC technical report
Mauro R, Kubovy M (1992) Caricature and face recognition. Mem Cogn 20(4):433–440
Messer K, Matas J, Kittler J, Luettin J, Maitre G (1999) XM2VTSDB: the extended M2VTS database. In: Second international conference on audio and video-based biometric person authentication
Mignon A, Jurie F (2012) CMML: a new metric learning approach for cross modal matching. In: ACCV
Mishra A, Nandan Rai S, Mishra A, Jawahar C.V (2016) IIIT-CFW: a benchmark database of cartoon faces in the wild. In: VASE ECCVW
Ouyang S, Hospedales TM, Song Y, Li X (2014) Cross-modal face matching: beyond viewed sketches. In: ACCV
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: BMVC
Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: CVPR
Simonyan K, Parkhi OM, Vedaldi A, Zisserman A (2013) Fisher vector faces in the wild. In: BMVC
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR. arXiv:1409.1556
Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: CVPR
Sugiyama M (2006) Local fisher discriminant analysis for supervised dimensionality reduction. In: ICML
Wang X, Tang X (2009) Face photo-sketch synthesis and recognition. IEEE Trans Pattern Anal Mach Intell 31(11):1955–1967
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mishra, A. DHFML: deep heterogeneous feature metric learning for matching photograph and cartoon pairs. Int J Multimed Info Retr 8, 135–142 (2019). https://doi.org/10.1007/s13735-018-0160-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-018-0160-4