Abstract
In this paper, we present a system for person re-identification in TV series. In the context of video retrieval, person re-identification refers to the task where a user clicks on a person in a video frame and the system then finds other occurrences of the same person in the same or different videos. The main characteristic of this scenario is that no previously collected training data is available, so no person-specific models can be trained in advance. Additionally, the query data is limited to the image that the user clicks on. These conditions pose a great challenge to the re-identification system, which has to find the same person in other shots despite large variations in the person’s appearance. In the study, facial appearance is used as the re-identification cue, since, in contrast to surveillance-oriented re-identification studies, the person can have different clothing in different shots. In order to increase the amount of available face data, the proposed system employs a face tracker that can track faces up to full profile views. This makes it possible to use a profile face image as query image and also to retrieve images with non-frontal poses. It also provides temporal association of the face images in the video, so that instead of using single images for query or target, whole tracks can be used. A fast and robust face recognition algorithm is used to find matching faces. If the match result is highly confident, our system adds the matching face track to the query set. Finally, if the user is not satisfied with the number of returned results, the system can present a small number of candidate face images and lets the user confirm the ones that belong to the queried person. These features help to increase the variation in the query set, making it possible to retrieve results with different poses, illumination conditions, etc. The system is extensively evaluated on two episodes of the TV series Coupling, showing very promising results.
Similar content being viewed by others
Notes
Since the matching with a large number of feature vectors is slow, and since only tracks are compared with each other in this system, as another preprocessing step, the closest distances between face images of two tracks are precomputed and saved for later use.
A demonstration video of the system can be found at http://cvhci.anthropomatik.kit.edu/~mfischer/person-retrieval/.
References
Apostoloff N, Zisserman A (2007) Who are you?—real-time person identification. In: Proc. of the British machine vision conference
Arandjelović O, Zisserman A (2005) Automatic face recognition for film character retrieval in feature-length films. In: Proc. of the conference on computer vision and pattern recognition, pp 860–867
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP Journal on Image and Video Processing 2008:10. doi:10.1155/2008/246309
Ekenel HK, Stiefelhagen R (2006) Analysis of local appearance-based face recognition: effects of feature selection and feature normalization. In: Proc. of the CVPR biometrics workshop
Ekenel HK, Stiefelhagen R (2009) Face alignment by minimizing the closest classification distance. In: Proc. of the intl. conf. on biometrics: theory, applications and systems
Ekenel HK, Fischer M, Gao H, Kilgour K, Marcos JS, Stiefelhagen R (2006) Universität Karlsruhe (TH) at TRECVID 2007. In: Proc. of the NIST TRECVID workshop
Ekenel HK, Jin Q, Fischer M, Stiefelhagen R (2007) ISL person identification systems in CLEAR 2007. In: Proc. of the CLEAR evaluation workshop, pp 256–265
Ekenel HK, Stallkamp J, Gao H, Fischer M, Stiefelhagen R (2007) Face recognition for smart interactions. In: Proc. of the intl. conf. on multimedia and expo, pp 1007–1010
Ekenel HK, Szasz-Toth L, Stiefelhagen R (2009) Open-set face recognition-based visitor interface system. In: Proc. of the intl. conf. on computer vision systems (2009)
Ekenel HK, Stallkamp J, Stiefelhagen R (2010) A video-based door monitoring system using local appearance-based face models. Comput Vis Image Underst 114(5):596–608
Gandhi T, Trivedi MM (2007) Person tracking and reidentification: introducing panoramic appearance map (PAM) for feature representation. Mach Vis Appl 18(3):207–220 (2007)
Gheissari N, Sebastian T, Hartley R (2006) Person reidentification using spatiotemporal appearance. In: Proc. of the conf. on computer vision and pattern recognition, vol 2, pp 1528–1535
Hamdoun O, Moutarde F, Stanciulescu B, Steux B (2008) Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences. In: Proc. of the ACM/ intl. conf. on distributed smart cameras, pp 1–6
Isard M, Blake A (1998) CONDENSATION—conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28
Li P, Ai H, Li Y, Huang C (2007) Video parsing based on head tracking and face recognition. In: Proc. of the ACM intl. conf. on image and video retrieval, pp 57–64
Ma Y, Ding X, Wang Z, Wang N (2004) Robust precise eye location under probabilistic framework. In: Proc. of the intl. conf. on automatic face and gesture recognition, pp 339–344
Open source computer vision library (OpenCV). http://opencv.willowgarage.com/. Accessed 22 June 2010
Ramanan D, Baker S, Kakade S (2007) Leveraging archival video for building face datasets. In: Proc. of the intl. conf. on computer vision, pp 1–8
Schapire R (1999) A brief introduction to boosting. In: Proc. of the intl. joint conf. on artificial intelligence, pp 1401–1405
Sivic J, Everingham M, Zisserman A (2005) Person spotting: video shot retrieval for face sets. In: Proc. of the conf. on image and video retrieval, pp 226–236
Sivic J, Everingham M, Zisserman A (2009) “Who are you?”—learning person specific classifiers from video. In: Proc. of the conf. on computer vision and pattern recognition, pp 1145–1152
Smith K, Gatica-Perez D, Odobez JM, Ba S (2005) Evaluating multi-object tracking. In: Proc. of the CVPR workshop on empirical evaluation methods in computer vision, p 36
Stallkamp J, Ekenel HK, Stiefelhagen R (2007) Video-based face recognition on real-world data. In: Proc. of the intl. conf. on computer vision, pp 1–8
Viola P, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: Proc. of the conf. on computer vision and pattern recognition, pp 511–518
Acknowledgements
This study is partially funded by OSEO, French State agency for innovation, as part of the Quaero Programme, and by the “Concept for the Future” of the Karlsruhe Institute of Technology within the framework of the German Excellence Initiative.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fischer, M., Ekenel, H.K. & Stiefelhagen, R. Person re-identification in TV series using robust face recognition and user feedback. Multimed Tools Appl 55, 83–104 (2011). https://doi.org/10.1007/s11042-010-0603-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0603-2