skip to main content
10.1145/3503161.3547731acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
demonstration

A Platform for Deploying the TFE Ecosystem of Automatic Speech Recognition

Published: 10 October 2022 Publication History

Abstract

Since data regulations such as the European Union's General Data Protection Regulation (GDPR) have taken effect, the traditional two-step Automatic Speech Recognition (ASR) optimization strategy (i.e., training a one-size-fits-all model with vendor's centralized data and fine-tuning the model with clients' private data) has become infeasible. To meet these privacy requirements, TFE, a novel GDPR-compliant ASR ecosystem, has been proposed by us to incorporate transfer learning, federated learning, and evolutionary learning towards effective ASR model optimization. In this demonstration, we further design and implement a novel platform to promote the deployment and applicability of TFE. Our proposed platform allows enterprises to easily conduct the ASR optimization task using TFE across organizations.

Supplementary Material

MP4 File (MM-demo-video.mp4)
Presentation video

References

[1]
Last accessed 13 May. 2022. Automatic Speech Recognition (ASR) Software Market is expected to reach a substantially Growth by 2027. https://www.digitaljournal.com/pr/automatic-speech-recognition-asr-softwar e-market-is-expected-to-reach-a-substantially-growth-by-2027--2
[2]
Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2020. Voxento: A Prototype Voice-controlled Interactive Search Engine for Lifelogs. In Proceedings of the Third Annual Workshop on Lifelog Search Challenge. 77--81.
[3]
William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4960--4964.
[4]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3--4 (2014), 211--407.
[5]
Di Jiang, Conghui Tan, Jinhua Peng, Chaotao Chen, Xueyang Wu, Weiwei Zhao, Yuanfeng Song, Yongxin Tong, Chang Liu, Qian Xu, et al. 2021. AGDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary Learning. ACM Transactions on Intelligent Systems and Technology (TIST) 12, 3 (2021), 1--19.
[6]
Dietrich Klakow and Jochen Peters. 2002. Testing the correlation of word error rate and perplexity. Speech Communication 38, 1--2 (2002), 19--28.
[7]
Jinyu Li, Rui Zhao, Eric Sun, Jeremy HM Wong, Amit Das, Zhong Meng, and Yifan Gong. 2020. High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7699--7703.
[8]
Longfei Li, Yong Zhao, Dongmei Jiang, Yanning Zhang, Fengna Wang, Isabel Gonzalez, Enescu Valentin, and Hichem Sahli. 2013. Hybrid deep neural network-- hidden markov model (dnn-hmm) based speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction. IEEE, 312--317.
[9]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS). 1273--1282.
[10]
Zhenhui Peng, Kaixiang Mo, Xiaogang Zhu, Junlin Chen, Zhijun Chen, Qian Xu, and Xiaojuan Ma. 2020. Understanding User Perceptions of Robot's Delay, Voice Quality-Speed Trade-off and GUI during Conversation. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1--8.
[11]
Vraj Shah, Side Li, Arun Kumar, and Lawrence Saul. 2020. SpeakQL: towards speech-driven multimodal querying of structured data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2363--2374.
[12]
Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang Zhao, and Di Jiang. 2022. VoiceQuerySystem: A Voice-driven Database Querying System Using Natural Language Questions. In Proceedings of the 2022 International Conference on Management of Data. 2385--2388.
[13]
Conghui Tan, Di Jiang, Huaxiao Mo, Jinhua Peng, Chaotao Chen, Rongzhong Lian, Yuanfeng Song, Qian Xu, and Qiang Yang. 2020. Federated Acoustic Model Optimization for Automatic Speech Recognition. In International Conference on Database Systems for Advanced Applications. Springer.
[14]
Conghui Tan, Di Jiang, Jinhua Peng, XueyangWu, Qian Xu, and Qiang Yang. 2020. A De Novo Divide-and-Merge Paradigm for Acoustic Model Optimization in Automatic Speech Recognition. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3709--3715.
[15]
Paul Voigt and Axel Von dem Bussche. 2017. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing (2017).
[16]
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.
[17]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320--3328.

Cited By

View all
  • (2023)A Comprehensive Survey on Privacy-Preserving Techniques in Federated Recommendation SystemsApplied Sciences10.3390/app1310620113:10(6201)Online publication date: 18-May-2023

Index Terms

  1. A Platform for Deploying the TFE Ecosystem of Automatic Speech Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Check for updates

    Author Tags

    1. evolutionary learning
    2. federated learning
    3. speech recognition

    Qualifiers

    • Demonstration

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Comprehensive Survey on Privacy-Preserving Techniques in Federated Recommendation SystemsApplied Sciences10.3390/app1310620113:10(6201)Online publication date: 18-May-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media