skip to main content
10.1145/3242840.3242873acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicacsConference Proceedingsconference-collections
research-article

Text Prompted Speaker Verification Based on Phoneme Clustering with Earth Mover's Distane and Cauchy-Schwarz Divergence

Authors Info & Claims
Published:27 July 2018Publication History

ABSTRACT

For short duration text prompted speaker verification where the amount of enrollment data is limited for each speaker model, it is hard to obtain a robust speaker representation. In these situations of short utterance speaker verification I-vector/GMM approaches work even worse than traditional GMM-MAP modeling method. GMM/HMM framework content matching is one of the state-of-the-art paradigms for short duration text-dependent speaker verification, in which models for individual lexical such as words, syllables, or phonemes are established for the background and speaker to make up mismatch. However, some of the phonemes do not occur in enrollment but happen in the testing recordings, and most of the phonemes have different preceding and succeeding phonemes, both of which leads to coarticulation difference. These are called lexical and context mismatch. In this work, to overcome the data sparceness caused lexical mismatch and context mismatch, phoneme states are clustered applying Earth Mover's Distance and Cauchy-Schwarz divergence as metrics. Performance improved as EER lowered by 6.2%, minDCF08 lowered by 1.9% for Earth Mover's Distance metric, and EER lowered by 3.7%, minDCF08 rised 1.9% for Cauchy-Schwarz divergence metric.

References

  1. Najim DehakPatrick J. KennyReda DehakPierre DumouchelPierre Ouellet. 2011. Front-End Factor Analysis for Speaker Verification. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING. 19, 4. 788--798. 445 HOES LANE, PISCATAWAY, NJ 08855-4141 USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Achintya Kr. SarkarZheng-Hua Tan. 2016. Text Dependent Speaker Verification Using Un-supervised HMM-UBM and Temporal GMM-UBM. 425--429.Google ScholarGoogle Scholar
  3. Dong WangLantian LiZhiyuan TangThomas Fang Zheng. 2017. Deep Speaker Verification: Do We Need End to End? 177--181.Google ScholarGoogle Scholar
  4. Chao LiXiaokong MaBing JiangXiangang LiXuewei ZhangXiao LiuYing CaoAjay KannanZhenyao Zhu. 2017. Deep Speaker: an End-to-End Neural Speaker Embedding System.Google ScholarGoogle Scholar
  5. Jinxi GuoUsha Amrutha NookalaAbeer Alwan. 2017. CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances. 3712--3716.Google ScholarGoogle Scholar
  6. Yi LiuLiang HeJia LiuMichael T. Johnson. 2017. Investigation of Frame Alignments for GMM-based Text-prompted Speaker Verification.Google ScholarGoogle Scholar
  7. Anthony LarcherAik Lee KongBin MaHaizhou Li. 2014. Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Communication. 60, 3. 56--77.Google ScholarGoogle Scholar
  8. A. LarcherK. A. LeeB. MaH. Li. 2012. The RSR2015: Database for text-dependent speaker verification using multiple pass-phrases. Interspeech.Google ScholarGoogle Scholar
  9. Nicolas Lei Yun Scheffer. 2014. Content matching for short duration speaker recognition. INTERSPEECH-2014. 1317--1321.Google ScholarGoogle Scholar
  10. Kong Aik LeeAnthony LarcherHelen ThaiBin MaHaizhou Li. 2011. Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home. 3317--3318.Google ScholarGoogle Scholar
  11. Lee, Kong Aik. Larcher, Anthony. Wang, Guangsen. etc (2015): "The reddots data collection for speaker recognition", In INTERSPEECH-2015, 2996--3000.Google ScholarGoogle Scholar
  12. S. J. YoungG. EvermannM. J. F. GalesT. HainD. KershawX. LiuG. MooreJ. OdellD. OllasonD. Povey. 2006. The HTK book (v3. 4).Google ScholarGoogle Scholar
  13. Robert JenssenDeniz ErdogmusKenneth E. HildJose C. PrincipeTorbjørn Eltoft. 2005. Optimizing the Cauchy-Schwarz PDF Distance for Information Theoretic, Non-parametric Clustering. 34--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Peihua LiQilong WangLei Zhang. 2014. A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models. 1689--1696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Douglas A. ReynoldsThomas F. QuatieriRobert B. Dunn. 2000. Speaker Verification Using Adapted Gaussian Mixture Models.Google ScholarGoogle Scholar
  16. Guangsen WangAik Lee KongTrung Hieu NguyenHanwu SunBin Ma. 2016. Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker. 415--419.Google ScholarGoogle Scholar
  17. W. M. CampbellD. E. SturimD. A. ReynoldsA. Solomonoff. 2006. SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation.Google ScholarGoogle Scholar
  18. Wei WuThomas Fang ZhengMing Xing XuFrank K. Soong. 2007. A Cohort-Based Speaker Model Synthesis for Mismatched Channels in Speaker Verification. IEEE Transactions on Audio Speech & Language Processing. 15, 6. 1893--1903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. KampaE. HasanbelliuJ. C. Principe. 2011. Closed-form cauchy-schwarz PDF divergence for mixture of Gaussians. 2578--2585.Google ScholarGoogle Scholar
  20. Fan WangLeonidas J. Guibas. 2012. Supervised Earth Mover's Distance Learning and Its Computer Vision Applications. Springer Berlin Heidelberg.Google ScholarGoogle Scholar
  21. Y. RubnerC. TomasiL. J. Guibas. 1998. A Metric for Distributions with Applications to Image Databases. 59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. StadelmannB. Freisleben. 2006. Fast and Robust Speaker Clustering Using the Earth Mover'S Distance and Mixmax Models.Google ScholarGoogle Scholar

Index Terms

  1. Text Prompted Speaker Verification Based on Phoneme Clustering with Earth Mover's Distane and Cauchy-Schwarz Divergence

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICACS '18: Proceedings of the 2nd International Conference on Algorithms, Computing and Systems
      July 2018
      245 pages
      ISBN:9781450365093
      DOI:10.1145/3242840

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 July 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader