skip to main content
10.1145/3552466acmconferencesBook PagePublication PagesmmConference Proceedingsconference-collections
DDAM '22: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia
ACM2022 Proceeding
  • General Chairs:
  • Jianhua Tao,
  • Haizhou Li,
  • Helen Meng,
  • Dong Yu,
  • Masato Akagi,
  • Program Chairs:
  • Jiangyan Yi,
  • Cunhang Fan,
  • Ruibo Fu,
  • Shan Lian,
  • Pengyuan Zhang
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
MM '22: The 30th ACM International Conference on Multimedia Lisboa Portugal 14 October 2022
ISBN:
978-1-4503-9496-3
Published:
10 October 2022
Sponsors:
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN

Reflects downloads up to 19 Feb 2025Bibliometrics
Skip Abstract Section
Abstract

It is our great pleasure to welcome you to the 1st International Workshop on Deepfake Detection for Audio Multimedia - DDAM 2022. Audio deepfake detection is an emerging topic in multimedia fields, which was included in the ASVspoof 2021. In this workshop, we aim to bring together researchers from the fields of audio deepfake detection, audio deep synthesis, audio fake game and adversarial attacks to further discuss recent research and future directions for detecting deepfake and manipulated audios in multimedia. As far as we know, we are the first workshop to focus on deepfake detection of audio multimedia, which is of great significance.

Skip Table Of Content Section
SESSION: Keynote Talks
keynote
Lessons Learned from ASVSpoof and Remaining Challenges

Although speech technology reproducing an individual's voice is expected to bring new value to entertainment, it may cause security problems in speaker recognition systems if misused. In addition, there is a possibility of this technology being used for ...

SESSION: Session 1: Deepfake Audio Detection
research-article
Detection of Synthetic Speech Based on Spectrum Defects

Synthetic spoofing speech has become a threat to online communication and automatic speaker verification (ASV) systems based on deep learning since the synthetic model can produce anyone's voice. The first Audio Deep Synthesis Detection Challenge (ADD ...

research-article
Low-quality Fake Audio Detection through Frequency Feature Masking

The first Audio Deep Synthesis Detection Challenge (ADD 2022) competition was held which dealt with audio deepfake detection, audio deep synthesis, audio fake game, and adversarial attacks. Our team participated in track 1, classifying bona fide and fake ...

research-article
Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and ...

research-article
Fully Automated End-to-End Fake Audio Detection

The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious ...

research-article
A Comparative Study on Physical and Perceptual Features for Deepfake Audio Detection

Audio content synthesis has stepped into a new era and brought a great threat to daily life since the development of deep learning techniques. The ASVSpoof Challenge and the ADD Challenge have been launched to motivate the development of Deepfake audio ...

research-article
Open Access
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion

This paper describes the deepfake audio detection system submitted to the Audio Deep Synthesis Detection (ADD) Challenge Track 3.2 and gives an analysis of score fusion. The proposed system is a score-level fusion of several light convolutional neural ...

SESSION: Session 2: Deepfake Audio Generation and Evaluation
research-article
Open Access
Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis

End-to-end singing voice synthesis (SVS) is attractive due to the avoidance of pre-aligned data. However, the auto-learned alignment of singing voice with lyrics is difficult to match the duration information in a musical score, which will lead to the ...

research-article
Open Access
An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

\beginabstract Many effective attempts have been made for fake audio detection. However, they can only provide detection results but no countermeasures to curb this harm. For many related practical applications, what model or algorithm generated the ...

research-article
Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech. With our submitted system, this paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio ...

research-article
Open Access
Acoustic or Pattern? Speech Spoofing Countermeasure based on Image Pre-training Models

Traditional speech spoofing countermeasures (CM) typically contain a frontend which extract a two dimensional feature from the waveform, and a Convolutional Neural Network (CNN) based backend classifier. This pipeline is similar to an image ...

research-article
Open Access
Human Perception of Audio Deepfakes

The recent emergence of deepfakes has brought manipulated and generated content to the forefront of machine learning research. Automatic detection of deepfakes has seen many new machine learning techniques. Human detection capabilities, however, are far ...

research-article
Open Access
Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion

Audio deep synthesis techniques have been able to generate high-quality speech whose authenticity is difficult for humans to recognize. Meanwhile, many anti-spoofing systems have been developed to capture artifacts in the synthesized speech that are ...

Contributors
  • Beijing National Research Center for Information Science and Technology
  • Tencent
  • Japan Advanced Institute of Science and Technology
  • Institute of Automation Chinese Academy of Sciences
  • Anhui University
  • Chinese Academy of Sciences
  • Institute of Acoustics Chinese Academy of Sciences

Recommendations

Acceptance Rates

DDAM '22 Paper Acceptance Rate 12 of 14 submissions, 86%;
Overall Acceptance Rate 12 of 14 submissions, 86%
YearSubmittedAcceptedRate
DDAM '22141286%
Overall141286%