Automatic face clustering, which aims to group faces referring to the same people together, is a key component for face tagging and image management. Standard face clustering approaches that are based on analyzing facial features can already achieve high-precision results. However, they often suffer from low recall due to the large variation of faces in pose, expression, illumination, occlusion, etc. To improve the clustering recall without reducing the high precision, we leverage the heterogeneous context information to iteratively merge the clusters referring to same entities. We first investigate the appropriate methods to utilize the context information at the cluster level, including using of “common scene”, people co-occurrence, human attributes, and clothing. We then propose a unified framework that employs bootstrapping to automatically learn adaptive rules to integrate this heterogeneous contextual information, along with facial features, together. Finally, we discuss a novel methodology for integrating human-in-the-loop feedback mechanisms that leverage human interaction to achieve the high-quality clustering results. Experimental results on two personal photo collections and one real-world surveillance dataset demonstrate the effectiveness of the proposed approach in improving recall while maintaining very high precision of face clustering.

Similar content being viewed by others
Notice, in general, there could be different models for assigning weights to paths in addition to the flow model considered in the paper. For example, paths that go through larger group nodes could be assigned higher weight since larger groups of people tend to be better context than smaller ones.
We note that while larger dataset exists, (e.g., LFW, PubFig), these datasets (LFW and PubFig) are not suitable for our work because they only provide single face rather than the whole image, whereas we focus on disambiguating faces in a photo collection.
Ahonen T, Hadid A et al (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal
Amigo E et al (2008) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Technical Report
An L, Kafai M, Bhanu B (2013) Dynamic bayesian network for unconstrained face recognition in surveillance camera networks. IEEE J Emerg SelectTopics Circuits Syst 3(2):155–164
An L, Bhanu B, Yang S (2012) Boosting face recognition in real-world surveillance videos. In: IEEE ninth international conference on advanced video and signal-based surveillance (AVSS), pp 270–275
Berg TL, Berg AC et al (2004) Names and faces in the news. In: IEEE ICPR
Chen Z, Kalashnikov DV, Mehrotra S (2007) Adaptive graphical approach to entity resolution In: JCDL
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR
Etemad K, Chellappa R (1997) Discriminant analysis for recognition of human face images. In: AVBPA
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. In: Science
Gallagher A, Chen T (2008) Clothing cosegmentation for recognizing people. In: IEEE CVPR
Gallagher A, Chen T (2009) Understanding images of groups of people. In: IEEE CVPR
Kalashnikov DV, Chen Z, Mehrotra S, Nuray-Turan R (2011) Web people search via connection analysis. In: TKDE
Kumar N et al (2011) Describable visual attributes for face verification and image search. In: IEEE TPAMI
Lee YJ, Grauman K (2011) Face discovery with social context. In: BMVC
Nuray-Turan R, Kalashnikov DV, Mehrotra S (2012) Exploiting web querying for web people search. In: ACM TODS
Project sherlock @ uci. http://sherlock.ics.uci.edu
Shimizu K, Nitta N et al (2012) Classification based group photo retrieval with bag of people features. In: ICMR
Tang J, Hong R, Yan S, Chua T-S, Qi G-J, Jain R (2011) Image annotation by knn-sparse graph-based label propagation over noisily tagged web images. ACM Trans Intell Syst Technol 2(2):14–23
Tang J, Zha Z-J, Tao D, Chua T-S (2012) Semantic-gap-oriented active learning for multilabel image annotation. IEEE Trans Image Process 21(4):2354–2360
Tang J, Yan S, Hong R, Qi G, Chua T (2009) Inferring semantic concepts from community-contributed images and noisy tags. In: ACM multimedia
Wu P, Tang F (2010) Improving face clustering using social context. In: ACM multimedia
Yagnik J, Islam A (2007) Learning people annotation from the web via consistency learning. In: MIR
Zhang W et al (2010) Beyond face: improving person clustering in consumer photos by exploring contextual information. In: ICME
Zhang L, Kalashnikov DV, Mehrotra S (2013) A unified framework for context assisted face clustering. In: ACM international conference on multimedia retrieval (ACM ICMR 2013), Dallas
Zhang L, Kalashnikov DV, Mehrotra S, Vaisenberg R (2013) Context-based person identification framework for smart video surveillance. Machine Vision and Applications, pp. 1–15
Zhang L, Vaisenberg R, Mehrotra S, Kalashnikov DV (2011) Video entity resolution: Applying er techniques for smart video surveillance. In: PerCom workshops
Zhang L, Zhang K, Li C (2008) A topical pagerank based algorithm for recommender systems. In: ACM conference on research and development in information retrieval (SIGIR), pp 713–714
Zhao M, Teo Y et al (2006) Automatic person annotation of family photo album. In: CIVR
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by NSF grants CNS-1118114, CNS-1059436, CNS-1063596. It is part of NSF supported project Sherlock @ UCI (http://sherlock.ics.uci.edu): a UC Irvine project on Data Quality and Entity Resolution [16].
Rights and permissions
About this article
Cite this article
Zhang, L., Kalashnikov, D.V. & Mehrotra, S. Context-assisted face clustering framework with human-in-the-loop. Int J Multimed Info Retr 3, 69–88 (2014). https://doi.org/10.1007/s13735-014-0052-1
Issue Date:
DOI: https://doi.org/10.1007/s13735-014-0052-1