Audiovisual synchrony assessment for replay attack detection in talking face biometrics

Boutellaa, Elhocine; Boulkenafet, Zinelabidine; Komulainen, Jukka; Hadid, Abdenour

doi:10.1007/s11042-015-2848-2

Audiovisual synchrony assessment for replay attack detection in talking face biometrics

Published: 18 August 2015

Volume 75, pages 5329–5343, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Elhocine Boutellaa^1,2,
Zinelabidine Boulkenafet¹,
Jukka Komulainen¹ &
…
Abdenour Hadid¹

478 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Audiovisual speech synchrony detection is an important liveness check for talking face verification systems in order to make sure that the input biometric samples are actually acquired from the same source. In prior work, the used visual speech features have been mainly describing facial appearance or mouth shape in frame-wise manner, thus ignoring the lip motion between consecutive frames. Since also the visual speech dynamics are important, we take the spatiotemporal information into account and propose the use of space-time auto-correlation of gradients (STACOG) for measuring the audiovisual synchrony. For evaluating the effectiveness of the proposed approach, a set of challenging and realistic attack scenarios are designed by augmenting publicly available BANCA and XM2VTS datasets with synthetic replay attacks. Our experimental analysis shows that the STACOG features outperform the state of the art, e.g. discrete cosine transform based features, in measuring the audiovisual synchrony.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Face detection techniques: a review

Article 04 August 2018

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Py-Feat: Python Facial Expression Analysis Toolbox

Article Open access 08 August 2023

Notes

http://www.reallusion.com/crazytalk/

References

Argones Rúa E, Bredin H, Garca Mateo C, Chollet G, Gonzlez Jimnez D (2009) Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models. Pattern Anal Applic 12(3):271–284
Article MathSciNet Google Scholar
Bailly-Baillire E, Bengio S, Bimbot F, Hamouz M, Kittler J, Marithoz J, Matas J, Messer K, Popovici V, Pore F, Ruiz B, Thiran JP (2003) The banca database and evaluation protocol. In: Kittler J, Nixon M (eds) Audio- and Video-Based Biometric Person Authentication, Lecture Notes in Computer Science, vol 2688, pp 625–638. Springer, Berlin
Google Scholar
Ben-Yacoub S, Abdeljaoued Y, Mayoraz E (1999) Fusion of face and speech data for person identity verification. IEEE Transactions on Neural Networks 10(5):1065–1074
Article Google Scholar
Bredin H, Chollet G (2008) Making talking-face authentication robust to deliberate imposture. In: International conference on acoustics, speech and signal processing (ICASSP), pp 1693–1696
Chetty G (2009) Biometric liveness detection based on cross modal fusion. In: 12th International conference on information fusion, FUSION ’09, pp 2255–2262
Chetty G (2010) Robust audio visual biometric person authentication with liveness verification. In: Sencar H, Velastin S, Nikolaidis N, Lian S (eds) Intelligent multimedia analysis for security applications, studies in computational intelligence, vol 282, pp 59–78. Springer, Berlin
Google Scholar
EL-Sallam AA, Mian AS (2011) Correlation based speech-video synchronization. Pattern Recogn Lett 32(6):780–786
Article Google Scholar
Eveno N, Besacier L (2005) Co-inertia analysis for ”liveness” test in audio-visual biometrics. In: International symposium on image and signal processing and analysis, (ISPA), pp 257–261
Faraj MI, Bigun J (2007) Audio-visual person authentication using lip-motion from orientation maps. Pattern Recogn Lett 28(11):1368–1382
Article Google Scholar
Fauve B, Bredin H, Karam W, Verdet F, Mayoue A, Chollet G, Hennebert J, Lewis R, Mason J, Mokbel C, Petrovska D (2008) Some results from the biosecure talking face evaluation campaign. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2008, pp 4137–4140
Hardoon DR, Szedmak SR, Shawe-taylor JR (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664
Article MATH Google Scholar
Karam W, Bredin H, Greige H, Chollet G, Mokbel C (2009) Talking-face identity verification, audiovisual forgery, and robustness issues. EURASIP Journal on Advances in Signal Processing 4
Kobayashi T, Otsu N (2008) Image feature extraction using gradient local auto-correlations. In: Proceedings of the 10th European conference on computer vision: Part I, ECCV ’08, pp 346–358. Springer, Berlin
Google Scholar
Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space-time gradients. Pattern Recognit Lett 33(9):1188–1195
Article Google Scholar
Liu Y, Sato Y (2010) Recovery of audio-to-video synchronization through analysis of cross-modality correlation. Pattern Recognit Lett 31(8):696–701
Article Google Scholar
Marcel S, Nixon MS, Li SZ (2014) Handbook of Biometric Anti-Spoofing: Trusted Biometrics Under Spoofing Attacks. Springer
Messer K, Matas J, Kittler J, Jonsson K (1999) Xm2vtsdb: The extended m2vts database. In: 2nd international conference on audio and video-based biometric person authentication, pp 72–77
Rodrigues RN, Ling LL, Govindaraju V (2009) Robustness of multimodal biometric fusion methods against spoof attacks. Journal of Visual Language and Computing 20(3):169–179
Article Google Scholar
Rosipal R, Krmer N (2006) Overview and recent advances in partial least squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J (eds) Subspace, Latent Structure and Feature Selection, Lecture Notes in Computer Science, vol 3940, pp 34–51. Springer, Berlin
Google Scholar
Slaney M, Covell M (2000) Facesync: A linear operator for measuring synchronization of video facial images and audio tracks. In: Neural information processing systems conference, pp 814–820
Uṙiċȧṙ M, Franc V, Thomas D, Akihiro S, Hlavȧċ V (2015) Real-time multi-view facial landmark detector learned by the structured output svm. In: IEEE international conference on automatic face and gesture recognition conference and workshops. IEEE
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: IEEE conference on computer vision and pattern recognition, pp 2879–2886
Zhu ZY, He QH, Feng XH, Li YX, Feng Wang Z (2013) Liveness detection using time drift between lip movement and voice. In: International conference on machine learning and cybernetics (ICMLC), vol 02, pp 973–978

Download references

Acknowledgments

E. Boutellaa is acknowledging the financial support of the Algerian MESRS and CDTA under the grant number 060/PNE/ENS/FINLANDE/2014-2015. The support of the Academy of Finland and Infotech Oulu Doctoral Program is also acknowledged.

Author information

Authors and Affiliations

Center for Machine Vision Research, Computer Science and Engineering, University of Oulu, Oulu, Finland
Elhocine Boutellaa, Zinelabidine Boulkenafet, Jukka Komulainen & Abdenour Hadid
Telecom Division, Centre de Développement des Technologies Avancées, Algiers, Algeria
Elhocine Boutellaa

Authors

Elhocine Boutellaa
View author publications
You can also search for this author in PubMed Google Scholar
Zinelabidine Boulkenafet
View author publications
You can also search for this author in PubMed Google Scholar
Jukka Komulainen
View author publications
You can also search for this author in PubMed Google Scholar
Abdenour Hadid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elhocine Boutellaa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boutellaa, E., Boulkenafet, Z., Komulainen, J. et al. Audiovisual synchrony assessment for replay attack detection in talking face biometrics. Multimed Tools Appl 75, 5329–5343 (2016). https://doi.org/10.1007/s11042-015-2848-2

Download citation

Received: 03 May 2015
Revised: 09 July 2015
Accepted: 30 July 2015
Published: 18 August 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11042-015-2848-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audiovisual synchrony assessment for replay attack detection in talking face biometrics

Abstract

Access this article

Similar content being viewed by others

Face detection techniques: a review

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Py-Feat: Python Facial Expression Analysis Toolbox

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Audiovisual synchrony assessment for replay attack detection in talking face biometrics

Abstract

Access this article

Similar content being viewed by others

Face detection techniques: a review

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Py-Feat: Python Facial Expression Analysis Toolbox

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation