ABSTRACT
In this paper, we propose a non-interactive scheme to achieve end-to-end keyword spotting in the homomorphic encrypted domain using deep learning techniques. We carefully designed a complex-valued convolutional neural network (CNN) structure for the encrypted domain keyword spotting to take full advantage of the limited multiplicative depth. At the same depth, the proposed complex-valued CNN can learn more speech representations than the real-valued CNN, thus achieving higher accuracy in keyword spotting. The complex activation function of the complex-valued CNN is non-arithmetic and cannot be supported by homomorphic encryption. To implement the complex activation function in the encrypted domain without interaction, we design methods to approximate complex activation functions with low-degree polynomials while preserving the keyword spotting performance. Our scheme supports single-instruction multiple-data (SIMD), which reduces the total size of ciphertexts and improves computational efficiency. We conducted extensive experiments to investigate our performance with various metrics, such as accuracy, robustness, and F1-score. The experimental results show that our approach significantly outperforms the state-of-the-art solutions on every metric.
Supplemental Material
Available for Download
- Senthildevi K. A and Chandra E. 2015. Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm. In 2015 International Conference on Communications and Signal Processing (ICCSP). 0550--0554. https://doi.org/10.1109/ICCSP.2015.7322545Google ScholarCross Ref
- Andreea B Alexandru, Manfred Morari, and George J Pappas. 2018. Cloud-based MPC with encrypted data. In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 5014--5019.Google ScholarDigital Library
- Ahmad Al Badawi, Jin Chao, Jie Lin, Chan Fook Mun, Sim Jun Jie, Benjamin Hong Meng Tan, Xiao Nan, Khin Mi Mi Aung, and Vijay Ramaseshan Chandrasekhar. 2018. The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs. IACR Cryptol. ePrint Arch. 2018 (2018), 1056.Google Scholar
- Tiziano Bianchi, Alessandro Piva, and Mauro Barni. 2009. On the implementation of the discrete Fourier transform in the encrypted domain. IEEE Transactions on Information Forensics and Security (2009).Google ScholarDigital Library
- Fabian Boemer, Anamaria Costache, Rosario Cammarota, and Casimir Wierzynski. 2019. NGraph-HE2: A High-Throughput Framework for Neural Network Inference on Encrypted Data. In Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography (London, United Kingdom) (WAHC'19). Association for Computing Machinery, New York, NY, USA, 45--56. https://doi.org/10.1145/3338469.3358944Google ScholarDigital Library
- Joppe W. Bos, Kristin Lauter, Jake Loftus, and Michael Naehrig. 2013. Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme. In Cryptography and Coding, Martijn Stam (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 45--64.Google ScholarDigital Library
- Florian Bourse, Michele Minelli, Matthias Minihold, and Pascal Paillier. 2018. Fast homomorphic evaluation of deep discretized neural networks. In Annual International Cryptology Conference. Springer, 483--512.Google ScholarDigital Library
- Zvika Brakerski. 2012. Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP. In Advances in Cryptology - CRYPTO 2012 - 32nd Annual Cryptology Conference (Lecture Notes in Computer Science, Vol. 7417). Springer, 868--886. https://doi.org/10.1007/978-3-642-32009-5_50Google Scholar
- Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. 2014. (Leveled) Fully Homomorphic Encryption without Bootstrapping. ACM Trans. Comput. Theory 6, 3 (2014), 13:1--13:36. https://doi.org/10.1145/2633600Google ScholarDigital Library
- Alon Brutzkus, Ran Gilad-Bachrach, and Oren Elisha. 2019. Low Latency Privacy Preserving Inference. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019 (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 812--821.Google Scholar
- Hervé Chabanne, Amaury de Wargny, Jonathan Milgram, Constance Morel, and Emmanuel Prouff. 2017. Privacy-Preserving Classification on Deep Neural Network. IACR Cryptol. ePrint Arch. 2017 (2017), 35.Google Scholar
- Jin Chao, Ahmad Al Badawi, Balagopal Unnikrishnan, Jie Lin, Chan Fook Mun, James M. Brown, J. Peter Campbell, Michael F. Chiang, Jayashree Kalpathy-Cramer, Vijay Ramaseshan Chandrasekhar, Pavitra Krishnaswamy, and Khin Mi Mi Aung. 2019. CaRENets: Compact and Resource-Efficient CNN for Homomorphic Inference on Encrypted Medical Images. CoRR abs/1901.10074 (2019). arXiv:1901.10074Google Scholar
- Jung Hee Cheon, Kyoohyung Han, Andrey Kim, Miran Kim, and Yongsoo Song. 2018. Bootstrapping for Approximate Homomorphic Encryption. In Annual International Conference on the Theory and Applications of Cryptographic Techniques.Google Scholar
- J. H. Cheon, K. Han, A. Kim, M. Kim, and Y. Song. 2018. A Full RNS Variant of Approximate Homomorphic Encryption. Springer, Cham (2018).Google Scholar
- Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2017. Homomorphic Encryption for Arithmetic of Approximate Numbers. In International Conference on the Theory and Application of Cryptology and Information Security.Google Scholar
- Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabachène. 2019. TFHE: Fast Fully Homomorphic Encryption Over the Torus. Journal of Cryptology 33 (04 2019). https://doi.org/10.1007/s00145-019-09319-xGoogle ScholarDigital Library
- Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, and Kyogu Lee. 2018. Phase-aware speech enhancement with deep complex u-net. In International Conference on Learning Representations.Google Scholar
- Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, and Sungjoo Ha. 2019. Temporal convolution for real-time keyword spotting on mobile devices. arXiv preprint arXiv:1904.03814 (2019).Google Scholar
- Alice Coucke, Mohammed Chlieh, Thibault Gisselbrecht, David Leroy, Mathieu Poumeyrol, and Thibaut Lavril. 2019. Efficient Keyword Spotting Using Dilated Convolutions and Gating. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6351--6355. https://doi.org/10.1109/ICASSP.2019.8683474Google Scholar
- Junfeng Fan and Frederik Vercauteren. 2012. Somewhat Practical Fully Homomorphic Encryption. IACR Cryptol. ePrint Arch. 2012 (2012), 144.Google Scholar
- Aravind Ganapathiraju, Jonathan Hamaker, and Joseph Picone. 2000. Hybrid SVM/HMM architectures for speech recognition. In Sixth international conference on spoken language processing.Google ScholarCross Ref
- Craig Gentry. 2009. Fully Homomorphic Encryption Using Ideal Lattices. Proceedings of the Annual ACM Symposium on Theory of Computing 9, 169--178. https://doi.org/10.1145/1536414.1536440Google ScholarDigital Library
- Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin E. Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016 (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 201--210.Google Scholar
- Ehsan Hesamifard, Hassan Takabi, and Mehdi Ghasemi. 2017. CryptoDL: Deep Neural Networks over Encrypted Data. (2017).Google Scholar
- Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, and Lei Xie. 2020. DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv preprint arXiv:2008.00264 (2020).Google Scholar
- Takumi Ishiyama, Takuya Suzuki, and Hayato Yamana. 2020. Highly Accurate CNN Inference Using Approximate Activation Functions over Homomorphic Encryption. In 2020 IEEE International Conference on Big Data (Big Data). 3989--3995. https://doi.org/10.1109/BigData50022.2020.9378372Google ScholarCross Ref
- Xiaoqian Jiang, Miran Kim, Kristin E. Lauter, and Yongsoo Song. 2018. Secure Outsourced Matrix Computation and Application to Neural Networks. IACR Cryptol. ePrint Arch. 2018 (2018), 1041.Google Scholar
- Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. 2018. Gazelle: A Low Latency Framework for Secure Neural Network Inference. (01 2018).Google Scholar
- Lukasz Kaiser, Aidan N Gomez, and Francois Chollet. 2017. Depthwise separable convolutions for neural machine translation. arXiv preprint arXiv:1706.03059 (2017).Google Scholar
- Mohamed O. M. Khelifa, Yahya O. M. ElHadj, Abdellah Yousfi, and Mostafa Belkasmi. 2017. Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. Int. J. Speech Technol. 20, 4 (2017), 937--949. https://doi.org/10.1007/s10772-017-9456-7Google ScholarDigital Library
- A. Kim. 2018. HEAAN. https://github.com/kimandrik/HEAANGoogle Scholar
- M. Kim, X. Jiang, K. Lauter, E. Ismayilzada, and S. Shams. 2021. HEAR: Human Action Recognition via Neural Networks on Homomorphically Encrypted Data. (2021).Google Scholar
- James Lin, Kevin Kilgour, Dominik Roblek, and Matthew Sharifi. 2020. Training Keyword Spotters with Limited and Synthesized Speech Data. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7474--7478. https://doi.org/10.1109/ICASSP40776.2020.9053193Google Scholar
- Vadim Lyubashevsky, Chris Peikert, and Oded Regev. 2010. On Ideal Lattices and Learning with Errors over Rings. In Advances in Cryptology -- EUROCRYPT 2010, Henri Gilbert (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--23.Google ScholarDigital Library
- Masato Mimura, Sei Ueno, Hirofumi Inaguma, Shinsuke Sakai, and Tatsuya Kawahara. 2018. Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition. In 2018 IEEE Spoken Language Technology Workshop (SLT). 477--484. https://doi.org/10.1109/SLT.2018.8639589Google Scholar
- Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, and Gerhard Rigoll. 2020. Small-Footprint Keyword Spotting on Raw Audio Data with Sinc- Convolutions. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7454--7458. https://doi.org/10.1109/ICASSP40776.2020.9053395Google Scholar
- P. Paillier. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Advances in Cryptology - EUROCRYPT '99, International Conference on the Theory and Application of Cryptographic Techniques.Google Scholar
- M. Sadegh Riazi, Mohammad Samragh, Hao Chen, Kim Laine, Kristin Lauter, and Farinaz Koushanfar. 2019. XONN: XNOR-based Oblivious Deep Neural Network Inference. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 1501--1518. https://www.usenix.org/conference/usenixsecurity19/presentation/riaziGoogle Scholar
- R. L. Rivest, A. Shamir, and L. Adleman. 1977. On Digital Signatures and Public-Key Cryptosystems. (1977).Google Scholar
- Jan Stadermann and Gerhard Rigoll. 2004. A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition. In Proc. Int. Conf. on Spoken Language Processing ICSLP# 2004, Jeju Island, South Korea.Google ScholarCross Ref
- Raphael Tang and Jimmy Lin. 2018. Deep Residual Learning for Small-Footprint Keyword Spotting. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5484--5488. https://doi.org/10.1109/ICASSP.2018.8462688Google Scholar
- Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, Joo Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J Pal. 2017. Deep Complex Networks. (2017).Google Scholar
- Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J. Pal. 2018. Deep Complex Networks. In 6th International Conference on Learning Representations, ICLR 2018,.Google Scholar
- Pete Warden. 2017. Speech Commands: A public dataset for single-word speech recognition. Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz (2017).Google Scholar
- Minz Won, Sanghyuk Chun, Oriol Nieto, and Xavier Serrc. 2020. Data-Driven Harmonic Filters for Audio Representation Learning. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 536--540. https://doi.org/10.1109/ICASSP40776.2020.9053669Google Scholar
- Pengtao Xie, Misha Bilenko, Tom Finley, Ran Gilad-Bachrach, Kristin E. Lauter, and Michael Naehrig. 2014. Crypto-Nets: Neural Networks over Encrypted Data. CoRR abs/1412.6181 (2014). arXiv:1412.6181Google Scholar
- Andrew Chi-Chih Yao. 1986. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986). IEEE, 162--167.Google Scholar
- Shi-Xiong Zhang, Yifan Gong, and Dong Yu. 2019. Encrypted Speech Recognition Using Deep Polynomial Networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019. IEEE, 5691--5695. https://doi.org/10.1109/ICASSP.2019.8683721Google Scholar
- Peijia Zheng and Jiwu Huang. 2013. Discrete wavelet transform and data expansion reduction in homomorphic encrypted domain. IEEE Transactions on Image Processing 22, 6 (2013), 2455--2468.Google ScholarDigital Library
- Peijia Zheng and Jiwu Huang. 2018. Efficient encrypted images filtering and transform coding with walsh-hadamard transform and parallelization. IEEE Transactions on Image Processing 27, 5 (2018), 2541--2556.Google ScholarCross Ref
- Yimeng Zhuang, Xuankai Chang, Yanmin Qian, and Kai Yu. 2016. Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC. 938--942. https://doi.org/10. 21437/Interspeech.2016--753Google Scholar
Index Terms
- Keyword Spotting in the Homomorphic Encrypted Domain Using Deep Complex-Valued CNN
Recommendations
Implementation of the discrete wavelet transform and multiresolution analysis in the encrypted domain
MM '11: Proceedings of the 19th ACM international conference on MultimediaSignal processing in the encrypted domain is a new technology for protecting valuable signals from insecure signal processing. Although there has been some research in the area, this field of research is still in its infancy.
In this paper, we propose a ...
Chosen ciphertext secure keyed-homomorphic public-key cryptosystems
In homomorphic encryption schemes, anyone can perform homomorphic operations, and therefore, it is difficult to manage when, where and by whom they are performed. In addition, the property that anyone can "freely" perform the operation inevitably means ...
Proofs of Encrypted Data Retrievability with Probabilistic and Homomorphic Message Authenticators
TRUSTCOM '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 01When users store their data on a cloud, they may concern on whether their data is stored correctly and can be fully retrieved. Proofs of Retrivability (PoR) is a cryptographic concept that allows users to remotely check the integrity of their data ...
Comments