skip to main content
10.1145/3664647.3688978acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
introduction
Free access

A Synopsis of FAME 2024 Challenge: Associating Faces with Voices in Multilingual Environments

Published: 28 October 2024 Publication History

Abstract

Over half of the world's population is bilingual and people often communicate under multilingual scenarios. The Face-Voice Association in Multilingual Environments (FAME) 2024 Challenge, held at ACM Multimedia 2024, focuses on establishing face-voice association to analyze the impact of multiple languages on the verification process. This report provides a brief summary of the challenge.

References

[1]
Guangyu Chen, Deyuan Zhang, Tao Liu, and Xiaoyong Du. 2023. Local-Global Contrast for Learning Voice-Face Representations. In 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 51--55.
[2]
Shota Horiguchi, Naoyuki Kanda, and Kenji Nagamatsu. 2018. Face-voice matching using cross-modal embeddings. In Proceedings of the 26th ACM international conference on Multimedia. 1011--1019.
[3]
Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, and Haizhou Li. 2024. Prompt-driven target speech diarization. In ICASSP 2024--2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 11086--11090.
[4]
Miyuki Kamachi, Harold Hill, Karen Lander, and Eric Vatikiotis-Bateson. 2003. Putting the face to the voice': Matching identity across modality. Current Biology, Vol. 13, 19 (2003), 1709--1714.
[5]
Jay Mathews. 2019. Half of the world is bilingual. What's our problem? www.washingtonpost.com/local/education/half-the-world-is-bilingual-whats-our-problem/2019/04/24/1c2b0cc2--6625--11e9-a1b6-b29b90efa879_story. [Online; accessed 10-June-2024].
[6]
Arsha Nagrani, Samuel Albanie, and Andrew Zisserman. 2018. Learnable pins: Cross-modal embeddings for person identity. In Proceedings of the European Conference on Computer Vision (ECCV). 71--88.
[7]
Arsha Nagrani, Samuel Albanie, and Andrew Zisserman. 2018. Seeing voices and hearing faces: Cross-modal biometric matching. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8427--8436.
[8]
Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood, and Alessandro Calefati. 2019. Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals. In 2019 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 1--7.
[9]
Shah Nawaz, Muhammad Saad Saeed, Pietro Morerio, Arif Mahmood, Ignazio Gallo, Muhammad Haroon Yousaf, and Alessio Del Bue. 2021. Cross-modal Speaker Verification and Recognition: A Multilingual Perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1682--1691.
[10]
Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. (2015).
[11]
Muhammad Saad Saeed, Muhammad Haris Khan, Shah Nawaz, Muhammad Haroon Yousaf, and Alessio Del Bue. 2022. Fusion and Orthogonal Projection for Improved Face-Voice Association. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7057--7061.
[12]
Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Muhammad Zaigham Zaheer, Karthik Nandakumar, Muhammad Haroon Yousaf, and Arif Mahmood. 2023. Single-branch network for multimodal training. In ICASSP 2023--2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1--5.
[13]
Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, and Muhammad Haroon Yousaf. 2024. Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan. arXiv preprint arXiv:2404.09342 (2024).
[14]
Weidi Xie, Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2019. Utterance-level aggregation for speaker recognition in the wild. In ICASSP 2019--2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5791--5795.

Index Terms

  1. A Synopsis of FAME 2024 Challenge: Associating Faces with Voices in Multilingual Environments

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
      October 2024
      11719 pages
      ISBN:9798400706868
      DOI:10.1145/3664647
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2024

      Check for updates

      Author Tags

      1. face-voice association
      2. multimodal learning

      Qualifiers

      • Introduction

      Conference

      MM '24
      Sponsor:
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne VIC, Australia

      Acceptance Rates

      MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 170
        Total Downloads
      • Downloads (Last 12 months)170
      • Downloads (Last 6 weeks)94
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media