research-article

Keyword Spotting in the Homomorphic Encrypted Domain Using Deep Complex-Valued CNN

Authors:
Peijia Zheng

Sun Yat-Sen University & Zhengzhou Xinda Institute of Advanced Technology, Guangzhou, China

Sun Yat-Sen University & Zhengzhou Xinda Institute of Advanced Technology, Guangzhou, China
View Profile

,
Zhiwei Cai

Sun Yat-Sen University, Guangzhou, China

Sun Yat-Sen University, Guangzhou, China
View Profile

,
Huicong Zeng

Sun Yat-Sen University, Guangzhou, China

Sun Yat-Sen University, Guangzhou, China
View Profile

,
Jiwu Huang

Shenzhen University & Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China

Shenzhen University & Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 1474–1483https://doi.org/10.1145/3503161.3548350

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 1474–1483

ABSTRACT

In this paper, we propose a non-interactive scheme to achieve end-to-end keyword spotting in the homomorphic encrypted domain using deep learning techniques. We carefully designed a complex-valued convolutional neural network (CNN) structure for the encrypted domain keyword spotting to take full advantage of the limited multiplicative depth. At the same depth, the proposed complex-valued CNN can learn more speech representations than the real-valued CNN, thus achieving higher accuracy in keyword spotting. The complex activation function of the complex-valued CNN is non-arithmetic and cannot be supported by homomorphic encryption. To implement the complex activation function in the encrypted domain without interaction, we design methods to approximate complex activation functions with low-degree polynomials while preserving the keyword spotting performance. Our scheme supports single-instruction multiple-data (SIMD), which reduces the total size of ciphertexts and improves computational efficiency. We conducted extensive experiments to investigate our performance with various metrics, such as accuracy, robustness, and F1-score. The experimental results show that our approach significantly outperforms the state-of-the-art solutions on every metric.

Supplemental Material

Available for Download

mp4

MM22-fp2732-updated.mp4 (194.7 MB)

References

Senthildevi K. A and Chandra E. 2015. Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm. In 2015 International Conference on Communications and Signal Processing (ICCSP). 0550--0554. https://doi.org/10.1109/ICCSP.2015.7322545Google ScholarCross Ref
Andreea B Alexandru, Manfred Morari, and George J Pappas. 2018. Cloud-based MPC with encrypted data. In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 5014--5019.Google ScholarDigital Library
Ahmad Al Badawi, Jin Chao, Jie Lin, Chan Fook Mun, Sim Jun Jie, Benjamin Hong Meng Tan, Xiao Nan, Khin Mi Mi Aung, and Vijay Ramaseshan Chandrasekhar. 2018. The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs. IACR Cryptol. ePrint Arch. 2018 (2018), 1056.Google Scholar
Tiziano Bianchi, Alessandro Piva, and Mauro Barni. 2009. On the implementation of the discrete Fourier transform in the encrypted domain. IEEE Transactions on Information Forensics and Security (2009).Google ScholarDigital Library
Fabian Boemer, Anamaria Costache, Rosario Cammarota, and Casimir Wierzynski. 2019. NGraph-HE2: A High-Throughput Framework for Neural Network Inference on Encrypted Data. In Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography (London, United Kingdom) (WAHC'19). Association for Computing Machinery, New York, NY, USA, 45--56. https://doi.org/10.1145/3338469.3358944Google ScholarDigital Library
Joppe W. Bos, Kristin Lauter, Jake Loftus, and Michael Naehrig. 2013. Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme. In Cryptography and Coding, Martijn Stam (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 45--64.Google ScholarDigital Library
Florian Bourse, Michele Minelli, Matthias Minihold, and Pascal Paillier. 2018. Fast homomorphic evaluation of deep discretized neural networks. In Annual International Cryptology Conference. Springer, 483--512.Google ScholarDigital Library
Zvika Brakerski. 2012. Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP. In Advances in Cryptology - CRYPTO 2012 - 32nd Annual Cryptology Conference (Lecture Notes in Computer Science, Vol. 7417). Springer, 868--886. https://doi.org/10.1007/978-3-642-32009-5_50Google Scholar
Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. 2014. (Leveled) Fully Homomorphic Encryption without Bootstrapping. ACM Trans. Comput. Theory 6, 3 (2014), 13:1--13:36. https://doi.org/10.1145/2633600Google ScholarDigital Library
Alon Brutzkus, Ran Gilad-Bachrach, and Oren Elisha. 2019. Low Latency Privacy Preserving Inference. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019 (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 812--821.Google Scholar
Hervé Chabanne, Amaury de Wargny, Jonathan Milgram, Constance Morel, and Emmanuel Prouff. 2017. Privacy-Preserving Classification on Deep Neural Network. IACR Cryptol. ePrint Arch. 2017 (2017), 35.Google Scholar
Jin Chao, Ahmad Al Badawi, Balagopal Unnikrishnan, Jie Lin, Chan Fook Mun, James M. Brown, J. Peter Campbell, Michael F. Chiang, Jayashree Kalpathy-Cramer, Vijay Ramaseshan Chandrasekhar, Pavitra Krishnaswamy, and Khin Mi Mi Aung. 2019. CaRENets: Compact and Resource-Efficient CNN for Homomorphic Inference on Encrypted Medical Images. CoRR abs/1901.10074 (2019). arXiv:1901.10074Google Scholar
Jung Hee Cheon, Kyoohyung Han, Andrey Kim, Miran Kim, and Yongsoo Song. 2018. Bootstrapping for Approximate Homomorphic Encryption. In Annual International Conference on the Theory and Applications of Cryptographic Techniques.Google Scholar
J. H. Cheon, K. Han, A. Kim, M. Kim, and Y. Song. 2018. A Full RNS Variant of Approximate Homomorphic Encryption. Springer, Cham (2018).Google Scholar
Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2017. Homomorphic Encryption for Arithmetic of Approximate Numbers. In International Conference on the Theory and Application of Cryptology and Information Security.Google Scholar
Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabachène. 2019. TFHE: Fast Fully Homomorphic Encryption Over the Torus. Journal of Cryptology 33 (04 2019). https://doi.org/10.1007/s00145-019-09319-xGoogle ScholarDigital Library
Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, and Kyogu Lee. 2018. Phase-aware speech enhancement with deep complex u-net. In International Conference on Learning Representations.Google Scholar
Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, and Sungjoo Ha. 2019. Temporal convolution for real-time keyword spotting on mobile devices. arXiv preprint arXiv:1904.03814 (2019).Google Scholar
Alice Coucke, Mohammed Chlieh, Thibault Gisselbrecht, David Leroy, Mathieu Poumeyrol, and Thibaut Lavril. 2019. Efficient Keyword Spotting Using Dilated Convolutions and Gating. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6351--6355. https://doi.org/10.1109/ICASSP.2019.8683474Google Scholar
Junfeng Fan and Frederik Vercauteren. 2012. Somewhat Practical Fully Homomorphic Encryption. IACR Cryptol. ePrint Arch. 2012 (2012), 144.Google Scholar
Aravind Ganapathiraju, Jonathan Hamaker, and Joseph Picone. 2000. Hybrid SVM/HMM architectures for speech recognition. In Sixth international conference on spoken language processing.Google ScholarCross Ref
Craig Gentry. 2009. Fully Homomorphic Encryption Using Ideal Lattices. Proceedings of the Annual ACM Symposium on Theory of Computing 9, 169--178. https://doi.org/10.1145/1536414.1536440Google ScholarDigital Library
Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin E. Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016 (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 201--210.Google Scholar
Ehsan Hesamifard, Hassan Takabi, and Mehdi Ghasemi. 2017. CryptoDL: Deep Neural Networks over Encrypted Data. (2017).Google Scholar
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, and Lei Xie. 2020. DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv preprint arXiv:2008.00264 (2020).Google Scholar
Takumi Ishiyama, Takuya Suzuki, and Hayato Yamana. 2020. Highly Accurate CNN Inference Using Approximate Activation Functions over Homomorphic Encryption. In 2020 IEEE International Conference on Big Data (Big Data). 3989--3995. https://doi.org/10.1109/BigData50022.2020.9378372Google ScholarCross Ref
Xiaoqian Jiang, Miran Kim, Kristin E. Lauter, and Yongsoo Song. 2018. Secure Outsourced Matrix Computation and Application to Neural Networks. IACR Cryptol. ePrint Arch. 2018 (2018), 1041.Google Scholar
Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. 2018. Gazelle: A Low Latency Framework for Secure Neural Network Inference. (01 2018).Google Scholar
Lukasz Kaiser, Aidan N Gomez, and Francois Chollet. 2017. Depthwise separable convolutions for neural machine translation. arXiv preprint arXiv:1706.03059 (2017).Google Scholar
Mohamed O. M. Khelifa, Yahya O. M. ElHadj, Abdellah Yousfi, and Mostafa Belkasmi. 2017. Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. Int. J. Speech Technol. 20, 4 (2017), 937--949. https://doi.org/10.1007/s10772-017-9456-7Google ScholarDigital Library
A. Kim. 2018. HEAAN. https://github.com/kimandrik/HEAANGoogle Scholar
M. Kim, X. Jiang, K. Lauter, E. Ismayilzada, and S. Shams. 2021. HEAR: Human Action Recognition via Neural Networks on Homomorphically Encrypted Data. (2021).Google Scholar
James Lin, Kevin Kilgour, Dominik Roblek, and Matthew Sharifi. 2020. Training Keyword Spotters with Limited and Synthesized Speech Data. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7474--7478. https://doi.org/10.1109/ICASSP40776.2020.9053193Google Scholar
Vadim Lyubashevsky, Chris Peikert, and Oded Regev. 2010. On Ideal Lattices and Learning with Errors over Rings. In Advances in Cryptology -- EUROCRYPT 2010, Henri Gilbert (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--23.Google ScholarDigital Library
Masato Mimura, Sei Ueno, Hirofumi Inaguma, Shinsuke Sakai, and Tatsuya Kawahara. 2018. Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition. In 2018 IEEE Spoken Language Technology Workshop (SLT). 477--484. https://doi.org/10.1109/SLT.2018.8639589Google Scholar
Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, and Gerhard Rigoll. 2020. Small-Footprint Keyword Spotting on Raw Audio Data with Sinc- Convolutions. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7454--7458. https://doi.org/10.1109/ICASSP40776.2020.9053395Google Scholar
P. Paillier. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Advances in Cryptology - EUROCRYPT '99, International Conference on the Theory and Application of Cryptographic Techniques.Google Scholar
M. Sadegh Riazi, Mohammad Samragh, Hao Chen, Kim Laine, Kristin Lauter, and Farinaz Koushanfar. 2019. XONN: XNOR-based Oblivious Deep Neural Network Inference. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 1501--1518. https://www.usenix.org/conference/usenixsecurity19/presentation/riaziGoogle Scholar
R. L. Rivest, A. Shamir, and L. Adleman. 1977. On Digital Signatures and Public-Key Cryptosystems. (1977).Google Scholar
Jan Stadermann and Gerhard Rigoll. 2004. A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition. In Proc. Int. Conf. on Spoken Language Processing ICSLP# 2004, Jeju Island, South Korea.Google ScholarCross Ref
Raphael Tang and Jimmy Lin. 2018. Deep Residual Learning for Small-Footprint Keyword Spotting. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5484--5488. https://doi.org/10.1109/ICASSP.2018.8462688Google Scholar
Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, Joo Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J Pal. 2017. Deep Complex Networks. (2017).Google Scholar
Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J. Pal. 2018. Deep Complex Networks. In 6th International Conference on Learning Representations, ICLR 2018,.Google Scholar
Pete Warden. 2017. Speech Commands: A public dataset for single-word speech recognition. Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz (2017).Google Scholar
Minz Won, Sanghyuk Chun, Oriol Nieto, and Xavier Serrc. 2020. Data-Driven Harmonic Filters for Audio Representation Learning. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 536--540. https://doi.org/10.1109/ICASSP40776.2020.9053669Google Scholar
Pengtao Xie, Misha Bilenko, Tom Finley, Ran Gilad-Bachrach, Kristin E. Lauter, and Michael Naehrig. 2014. Crypto-Nets: Neural Networks over Encrypted Data. CoRR abs/1412.6181 (2014). arXiv:1412.6181Google Scholar
Andrew Chi-Chih Yao. 1986. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986). IEEE, 162--167.Google Scholar
Shi-Xiong Zhang, Yifan Gong, and Dong Yu. 2019. Encrypted Speech Recognition Using Deep Polynomial Networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019. IEEE, 5691--5695. https://doi.org/10.1109/ICASSP.2019.8683721Google Scholar
Peijia Zheng and Jiwu Huang. 2013. Discrete wavelet transform and data expansion reduction in homomorphic encrypted domain. IEEE Transactions on Image Processing 22, 6 (2013), 2455--2468.Google ScholarDigital Library
Peijia Zheng and Jiwu Huang. 2018. Efficient encrypted images filtering and transform coding with walsh-hadamard transform and parallelization. IEEE Transactions on Image Processing 27, 5 (2018), 2541--2556.Google ScholarCross Ref
Yimeng Zhuang, Xuankai Chang, Yanmin Qian, and Kai Yu. 2016. Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC. 938--942. https://doi.org/10. 21437/Interspeech.2016--753Google Scholar

Index Terms

Keyword Spotting in the Homomorphic Encrypted Domain Using Deep Complex-Valued CNN
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Privacy protections
  2. Security services
    1. Privacy-preserving protocols

Recommendations

Implementation of the discrete wavelet transform and multiresolution analysis in the encrypted domain
MM '11: Proceedings of the 19th ACM international conference on Multimedia

Signal processing in the encrypted domain is a new technology for protecting valuable signals from insecure signal processing. Although there has been some research in the area, this field of research is still in its infancy.

In this paper, we propose a ...
Read More
Chosen ciphertext secure keyed-homomorphic public-key cryptosystems

In homomorphic encryption schemes, anyone can perform homomorphic operations, and therefore, it is difficult to manage when, where and by whom they are performed. In addition, the property that anyone can "freely" perform the operation inevitably means ...
Read More
Proofs of Encrypted Data Retrievability with Probabilistic and Homomorphic Message Authenticators
TRUSTCOM '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 01

When users store their data on a cloud, they may concern on whether their data is stored correctly and can be fully retrieved. Proofs of Retrivability (PoR) is a cryptographic concept that allows users to remotely check the integrity of their data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
complex-valued cnn
deep learning
homomorphic encryption
keyword spotting
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 180
  Total Downloads
- Downloads (Last 12 months)106
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Keyword Spotting in the Homomorphic Encrypted Domain Using Deep Complex-Valued CNN

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Implementation of the discrete wavelet transform and multiresolution analysis in the encrypted domain

Chosen ciphertext secure keyed-homomorphic public-key cryptosystems

Proofs of Encrypted Data Retrievability with Probabilistic and Homomorphic Message Authenticators