demonstration

A Platform for Deploying the TFE Ecosystem of Automatic Speech Recognition

Authors:

Rongzhong Lian,

Raymond Chi-Wing WongAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6952 - 6954

https://doi.org/10.1145/3503161.3547731

Published: 10 October 2022 Publication History

Abstract

Since data regulations such as the European Union's General Data Protection Regulation (GDPR) have taken effect, the traditional two-step Automatic Speech Recognition (ASR) optimization strategy (i.e., training a one-size-fits-all model with vendor's centralized data and fine-tuning the model with clients' private data) has become infeasible. To meet these privacy requirements, TFE, a novel GDPR-compliant ASR ecosystem, has been proposed by us to incorporate transfer learning, federated learning, and evolutionary learning towards effective ASR model optimization. In this demonstration, we further design and implement a novel platform to promote the deployment and applicability of TFE. Our proposed platform allows enterprises to easily conduct the ASR optimization task using TFE across organizations.

Supplementary Material

MP4 File (MM-demo-video.mp4)

Presentation video

Download
114.52 MB

References

[1]

Last accessed 13 May. 2022. Automatic Speech Recognition (ASR) Software Market is expected to reach a substantially Growth by 2027. https://www.digitaljournal.com/pr/automatic-speech-recognition-asr-softwar e-market-is-expected-to-reach-a-substantially-growth-by-2027--2

[2]

Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2020. Voxento: A Prototype Voice-controlled Interactive Search Engine for Lifelogs. In Proceedings of the Third Annual Workshop on Lifelog Search Challenge. 77--81.

Digital Library

[3]

William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4960--4964.

Digital Library

[4]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3--4 (2014), 211--407.

[5]

Di Jiang, Conghui Tan, Jinhua Peng, Chaotao Chen, Xueyang Wu, Weiwei Zhao, Yuanfeng Song, Yongxin Tong, Chang Liu, Qian Xu, et al. 2021. AGDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary Learning. ACM Transactions on Intelligent Systems and Technology (TIST) 12, 3 (2021), 1--19.

Digital Library

[6]

Dietrich Klakow and Jochen Peters. 2002. Testing the correlation of word error rate and perplexity. Speech Communication 38, 1--2 (2002), 19--28.

Digital Library

[7]

Jinyu Li, Rui Zhao, Eric Sun, Jeremy HM Wong, Amit Das, Zhong Meng, and Yifan Gong. 2020. High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7699--7703.

[8]

Longfei Li, Yong Zhao, Dongmei Jiang, Yanning Zhang, Fengna Wang, Isabel Gonzalez, Enescu Valentin, and Hichem Sahli. 2013. Hybrid deep neural network-- hidden markov model (dnn-hmm) based speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction. IEEE, 312--317.

[9]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS). 1273--1282.

[10]

Zhenhui Peng, Kaixiang Mo, Xiaogang Zhu, Junlin Chen, Zhijun Chen, Qian Xu, and Xiaojuan Ma. 2020. Understanding User Perceptions of Robot's Delay, Voice Quality-Speed Trade-off and GUI during Conversation. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1--8.

Digital Library

[11]

Vraj Shah, Side Li, Arun Kumar, and Lawrence Saul. 2020. SpeakQL: towards speech-driven multimodal querying of structured data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2363--2374.

Digital Library

[12]

Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang Zhao, and Di Jiang. 2022. VoiceQuerySystem: A Voice-driven Database Querying System Using Natural Language Questions. In Proceedings of the 2022 International Conference on Management of Data. 2385--2388.

Digital Library

[13]

Conghui Tan, Di Jiang, Huaxiao Mo, Jinhua Peng, Chaotao Chen, Rongzhong Lian, Yuanfeng Song, Qian Xu, and Qiang Yang. 2020. Federated Acoustic Model Optimization for Automatic Speech Recognition. In International Conference on Database Systems for Advanced Applications. Springer.

[14]

Conghui Tan, Di Jiang, Jinhua Peng, XueyangWu, Qian Xu, and Qiang Yang. 2020. A De Novo Divide-and-Merge Paradigm for Acoustic Model Optimization in Automatic Speech Recognition. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3709--3715.

[15]

Paul Voigt and Axel Von dem Bussche. 2017. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing (2017).

[16]

Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.

Digital Library

[17]

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320--3328.

Cited By

Asad MShaukat SJavanmardi ENakazato JTsukada M(2023)A Comprehensive Survey on Privacy-Preserving Techniques in Federated Recommendation SystemsApplied Sciences10.3390/app1310620113:10(6201)Online publication date: 18-May-2023
https://doi.org/10.3390/app13106201

Index Terms

A Platform for Deploying the TFE Ecosystem of Automatic Speech Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction

Speech intelligibility is the most important parameter in evaluation of speech quality. In the contribution, a new objective intelligibility assessment of general speech processing algorithms is proposed. It is based on automatic recognition methods ...
Syllable-based automatic arabic speech recognition in noisy-telephone channel

The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of ...
Comparing humans and automatic speech recognition systems in recognizing dysarthric speech
Canadian AI'11: Proceedings of the 24th Canadian conference on Advances in artificial intelligence

Speech is a complex process that requires control and coordination of articulation, breathing, voicing, and prosody. Dysarthria is a manifestation of an inability to control and coordinate one or more of these aspects, which results in poorly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Check for updates

Author Tags

Qualifiers

Demonstration

Funding Sources

Innovation and Technology Commission

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
132
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Asad MShaukat SJavanmardi ENakazato JTsukada M(2023)A Comprehensive Survey on Privacy-Preserving Techniques in Federated Recommendation SystemsApplied Sciences10.3390/app1310620113:10(6201)Online publication date: 18-May-2023
https://doi.org/10.3390/app13106201

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten