Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models

Gauder, Lara; Pepino, Leonardo; Ferrer, Luciana; Riera, Pablo

doi:10.21437/Interspeech.2021-753

Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models

Lara Gauder, Leonardo Pepino, Luciana Ferrer, Pablo Riera

This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer’s Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pre-trained models — trill, allosaurus, and wav2vec 2.0 — which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.

doi: 10.21437/Interspeech.2021-753

Cite as: Gauder, L., Pepino, L., Ferrer, L., Riera, P. (2021) Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models. Proc. Interspeech 2021, 3795-3799, doi: 10.21437/Interspeech.2021-753

@inproceedings{gauder21_interspeech,
  author={Lara Gauder and Leonardo Pepino and Luciana Ferrer and Pablo Riera},
  title={{Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3795--3799},
  doi={10.21437/Interspeech.2021-753}
}