research-article

Emotion Filtering at the Edge

Authors:

David BoyleAuthors Info & Claims

SenSys-ML 2019: Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems

Pages 1 - 6

https://doi.org/10.1145/3362743.3362960

Published: 10 November 2019 Publication History

Abstract

Voice controlled devices and services have become very popular in the consumer IoT. Cloud-based speech analysis services extract information from voice inputs using speech recognition techniques. Services providers can thus build very accurate profiles of users' demographic categories, personal preferences, emotional states, etc., and may therefore significantly compromise their privacy. To address this problem, we have developed a privacy-preserving intermediate layer between users and cloud services to sanitize voice input directly at edge devices. We use CycleGAN-based speech conversion to remove sensitive information from raw voice input signals before regenerating neutralized signals for forwarding. We implement and evaluate our emotion filtering approach using a relatively cheap Raspberry Pi 4, and show that performance accuracy is not compromised at the edge. Signals generated at the edge are shown to differ only slightly (~0.16%) from cloud-based approaches for speech recognition. Experimental evaluation of generated signals show that identification of the emotional state of a speaker can be reduced by ~91%.

References

[1]

[n. d.]. Emotion AI. https://www.affectiva.com/emotion-ai-overview/

[2]

[n. d.]. TinyML. https://sites.google.com/site/rankmap/

[3]

Efthimios Alepis and Constantinos Patsakis. 2017. Monkey says, monkey does: security and privacy on voice assistants. (2017).

[4]

Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text.

[5]

Carl Doersch and Andrew Zisserman. 2017. Multi-task self-supervised visual learning.

[6]

Yuan Gong and Christian Poellabauer. 2017. Crafting adversarial examples for speech paralinguistics applications. arXiv preprint arXiv:1711.03280 (2017).

[7]

Yuan Gong and Christian Poellabauer. 2018. Protecting voice controlled systems using sound source identification based on acoustic cues.

[8]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets.

[9]

IBM. 2019. IBM Watson Speech to Text. https://speech-to-text-demo.ng.bluemix.net

[10]

Huafeng Jin and Shuo Wang. 2018. Voice-based determination of physical and emotional characteristics of users.

[11]

Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, and Nobukatsu Hojo. 2019. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion.

[12]

Robert M Krauss, Robin Freyberg, and Ezequiel Morsella. 2002. Inferring speakersâĂ&Zacute; physical attributes from their voices. (2002).

[13]

Steven R Livingstone and Frank A Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. (2018).

[14]

François Mairesse, Marilyn A Walker, Matthias R Mehl, and Roger K Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. (2007).

[15]

Mohammad Malekzadeh, Richard G. Clegg, Andrea Cavallaro, and Hamed Haddadi. 2019. Mobile Sensor Data Anonymization.

[16]

Marcogdepinto. 2019. marcogdepinto/Emotion-Classification-Ravdess. https://github.com/marcogdepinto/Emotion-Classification-Ravdess

[17]

Masanori Morise, Fumiya Yokomori, and Kenji Ozawa. 2016. WORLD: a vocoder-based high-quality speech synthesis system for real-time applications. (2016).

[18]

Iosif Mporas and Todor Ganchev. 2009. Estimation of unknown speaker's height from speech. (2009).

[19]

Andreas Nautsch, Abelino Jiménez, Amos Treiber, Jascha Kolberg, Catherine Jasserand, Els Kindt, Héctor Delgado, Massimiliano Todisco, Mohamed Amine Hmani, Aymen Mtibaa, et al. 2019. Preserving Privacy in Speaker and Speech Characterisation. (2019).

[20]

Seyed Ali Osia, Ali Shahin Shamsabadi, Ali Taheri, Kleomenis Katevas, Sina Sajadmanesh, Hamid R Rabiee, Nicholas D Lane, and Hamed Haddadi. 2017. A hybrid deep learning architecture for privacy-preserving mobile analytics. (2017).

[21]

Scott R Peppet. 2014. Regulating the internet of things: first steps toward managing discrimination, privacy, security and consent. (2014).

[22]

Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, and Xiang-Yang Li. 2018. Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity.

Digital Library

[23]

Carson Reynolds and Rosalind Picard. 2004. Affective sensors, privacy, and ethical contracts. In CHI'04 Extended Abstracts on Human Factors in Computing Systems. ACM, 1103--1106.

Digital Library

[24]

Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, et al. 2013. The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism.

[25]

Aaron Sell, Gregory A Bryant, Leda Cosmides, John Tooby, Daniel Sznycer, Christopher Von Rueden, Andre Krauss, and Michael Gurven. 2010. Adaptations in humans for assessing physical strength from the voice. (2010).

[26]

Kar-Ann Toh, Jaihie Kim, and Sangyoun Lee. 2008. Biometric scores fusion based on total error rate minimization. Pattern Recognition 41, 3 (2008), 1066--1082.

Digital Library

[27]

George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network.

[28]

Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: A survey. (2015).

[29]

Weidi Xie, Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2019. Utterance-level Aggregation For Speaker Recognition In The Wild. (2019).

[30]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks.

Cited By

Fragkou EKatsaros D(2024)A Joint Survey in Decentralized Federated Learning and TinyML: A Brief Introduction to Swarm LearningFuture Internet10.3390/fi1611041316:11(413)Online publication date: 8-Nov-2024
https://doi.org/10.3390/fi16110413
Yadav P(2024)Advancements in Machine Learning in Sensor Systems: Insights from Sensys-ML and TinyML Communities2024 IEEE 3rd Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML)10.1109/SenSys-ML62579.2024.00009(21-26)Online publication date: 13-May-2024
https://doi.org/10.1109/SenSys-ML62579.2024.00009
Teixeira FAbad ARaj BTrancoso I(2024)Privacy-Oriented Manipulation of Speaker RepresentationsIEEE Access10.1109/ACCESS.2024.340906712(82949-82971)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3409067
Show More Cited By

Index Terms

Emotion Filtering at the Edge
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
2. Security and privacy

Recommendations

Paralinguistic Privacy Protection at the Edge
Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and ...
Privacy-preserving Voice Analysis via Disentangled Representations
CCSW'20: Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop

Voice User Interfaces (VUIs) are increasingly popular and built into smartphones, home assistants, and Internet of Things (IoT) devices. Despite offering an always-on convenient user experience, VUIs raise new security and privacy concerns for their ...
Privacy preserving speech analysis using emotion filtering at the edge: poster abstract
SenSys '19: Proceedings of the 17th Conference on Embedded Networked Sensor Systems

Voice controlled devices and services are commonplace in consumer IoT. Cloud-based analysis services extract information from voice input using speech recognition techniques. Services providers can build detailed profiles of users' demographics, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SenSys-ML 2019: Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems

November 2019

47 pages

ISBN:9781450370110

DOI:10.1145/3362743

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SenSys '19

Sponsor:

SenSys '19: The 17th ACM Conference on Embedded Networked Sensor Systems

November 10, 2019

NY, New York, USA

Acceptance Rates

SenSys-ML 2019 Paper Acceptance Rate 7 of 14 submissions, 50%;

Overall Acceptance Rate 7 of 14 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
290
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)4

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fragkou EKatsaros D(2024)A Joint Survey in Decentralized Federated Learning and TinyML: A Brief Introduction to Swarm LearningFuture Internet10.3390/fi1611041316:11(413)Online publication date: 8-Nov-2024
https://doi.org/10.3390/fi16110413
Yadav P(2024)Advancements in Machine Learning in Sensor Systems: Insights from Sensys-ML and TinyML Communities2024 IEEE 3rd Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML)10.1109/SenSys-ML62579.2024.00009(21-26)Online publication date: 13-May-2024
https://doi.org/10.1109/SenSys-ML62579.2024.00009
Teixeira FAbad ARaj BTrancoso I(2024)Privacy-Oriented Manipulation of Speaker RepresentationsIEEE Access10.1109/ACCESS.2024.340906712(82949-82971)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3409067
Testa BXiao YSharma HGump ASalekin A(2023)Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine LearningProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108877:3(1-30)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3610887
Zhang SLi ZDas ABoureanu ISchneider SReaves BTippenhauer N(2023)VoicePM: A Robust Privacy Measurement on Voice AnonymityProceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3558482.3590175(215-226)Online publication date: 29-May-2023
https://dl.acm.org/doi/10.1145/3558482.3590175
Saini SSaxena N(2023)Speaker Anonymity and Voice Conversion Vulnerability: A Speaker Recognition Analysis2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10289030(1-9)Online publication date: 2-Oct-2023
https://doi.org/10.1109/CNS59707.2023.10289030
Aloufi RHaddadi HBoyle D(2022)Paralinguistic Privacy Protection at the EdgeACM Transactions on Privacy and Security10.1145/357016126:2(1-27)Online publication date: 3-Nov-2022
https://dl.acm.org/doi/10.1145/3570161
Nukavarapu SAyyat MNadeem T(2022)MirageNet - Towards a GAN-based Framework for Synthetic Network Traffic GenerationGLOBECOM 2022 - 2022 IEEE Global Communications Conference10.1109/GLOBECOM48099.2022.10001494(3089-3095)Online publication date: 4-Dec-2022
https://doi.org/10.1109/GLOBECOM48099.2022.10001494
Santos VParreira WFernandes AGarcia Ovejero RLeithardt V(2022)Improving Speaker Recognition in Environmental Noise With Adaptive FilterIEEE Access10.1109/ACCESS.2022.322540510(124523-124533)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3225405
Aloufi RHaddadi HBoyle DZhang YSion R(2020)Privacy-preserving Voice Analysis via Disentangled RepresentationsProceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop10.1145/3411495.3421355(1-14)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.1145/3411495.3421355
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten