skip to main content
10.1145/3503161.3548350acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Keyword Spotting in the Homomorphic Encrypted Domain Using Deep Complex-Valued CNN

Authors Info & Claims
Published:10 October 2022Publication History

ABSTRACT

In this paper, we propose a non-interactive scheme to achieve end-to-end keyword spotting in the homomorphic encrypted domain using deep learning techniques. We carefully designed a complex-valued convolutional neural network (CNN) structure for the encrypted domain keyword spotting to take full advantage of the limited multiplicative depth. At the same depth, the proposed complex-valued CNN can learn more speech representations than the real-valued CNN, thus achieving higher accuracy in keyword spotting. The complex activation function of the complex-valued CNN is non-arithmetic and cannot be supported by homomorphic encryption. To implement the complex activation function in the encrypted domain without interaction, we design methods to approximate complex activation functions with low-degree polynomials while preserving the keyword spotting performance. Our scheme supports single-instruction multiple-data (SIMD), which reduces the total size of ciphertexts and improves computational efficiency. We conducted extensive experiments to investigate our performance with various metrics, such as accuracy, robustness, and F1-score. The experimental results show that our approach significantly outperforms the state-of-the-art solutions on every metric.

Skip Supplemental Material Section

Supplemental Material

References

  1. Senthildevi K. A and Chandra E. 2015. Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm. In 2015 International Conference on Communications and Signal Processing (ICCSP). 0550--0554. https://doi.org/10.1109/ICCSP.2015.7322545Google ScholarGoogle ScholarCross RefCross Ref
  2. Andreea B Alexandru, Manfred Morari, and George J Pappas. 2018. Cloud-based MPC with encrypted data. In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 5014--5019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ahmad Al Badawi, Jin Chao, Jie Lin, Chan Fook Mun, Sim Jun Jie, Benjamin Hong Meng Tan, Xiao Nan, Khin Mi Mi Aung, and Vijay Ramaseshan Chandrasekhar. 2018. The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs. IACR Cryptol. ePrint Arch. 2018 (2018), 1056.Google ScholarGoogle Scholar
  4. Tiziano Bianchi, Alessandro Piva, and Mauro Barni. 2009. On the implementation of the discrete Fourier transform in the encrypted domain. IEEE Transactions on Information Forensics and Security (2009).Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fabian Boemer, Anamaria Costache, Rosario Cammarota, and Casimir Wierzynski. 2019. NGraph-HE2: A High-Throughput Framework for Neural Network Inference on Encrypted Data. In Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography (London, United Kingdom) (WAHC'19). Association for Computing Machinery, New York, NY, USA, 45--56. https://doi.org/10.1145/3338469.3358944Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Joppe W. Bos, Kristin Lauter, Jake Loftus, and Michael Naehrig. 2013. Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme. In Cryptography and Coding, Martijn Stam (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 45--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Florian Bourse, Michele Minelli, Matthias Minihold, and Pascal Paillier. 2018. Fast homomorphic evaluation of deep discretized neural networks. In Annual International Cryptology Conference. Springer, 483--512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Zvika Brakerski. 2012. Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP. In Advances in Cryptology - CRYPTO 2012 - 32nd Annual Cryptology Conference (Lecture Notes in Computer Science, Vol. 7417). Springer, 868--886. https://doi.org/10.1007/978-3-642-32009-5_50Google ScholarGoogle Scholar
  9. Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. 2014. (Leveled) Fully Homomorphic Encryption without Bootstrapping. ACM Trans. Comput. Theory 6, 3 (2014), 13:1--13:36. https://doi.org/10.1145/2633600Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alon Brutzkus, Ran Gilad-Bachrach, and Oren Elisha. 2019. Low Latency Privacy Preserving Inference. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019 (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 812--821.Google ScholarGoogle Scholar
  11. Hervé Chabanne, Amaury de Wargny, Jonathan Milgram, Constance Morel, and Emmanuel Prouff. 2017. Privacy-Preserving Classification on Deep Neural Network. IACR Cryptol. ePrint Arch. 2017 (2017), 35.Google ScholarGoogle Scholar
  12. Jin Chao, Ahmad Al Badawi, Balagopal Unnikrishnan, Jie Lin, Chan Fook Mun, James M. Brown, J. Peter Campbell, Michael F. Chiang, Jayashree Kalpathy-Cramer, Vijay Ramaseshan Chandrasekhar, Pavitra Krishnaswamy, and Khin Mi Mi Aung. 2019. CaRENets: Compact and Resource-Efficient CNN for Homomorphic Inference on Encrypted Medical Images. CoRR abs/1901.10074 (2019). arXiv:1901.10074Google ScholarGoogle Scholar
  13. Jung Hee Cheon, Kyoohyung Han, Andrey Kim, Miran Kim, and Yongsoo Song. 2018. Bootstrapping for Approximate Homomorphic Encryption. In Annual International Conference on the Theory and Applications of Cryptographic Techniques.Google ScholarGoogle Scholar
  14. J. H. Cheon, K. Han, A. Kim, M. Kim, and Y. Song. 2018. A Full RNS Variant of Approximate Homomorphic Encryption. Springer, Cham (2018).Google ScholarGoogle Scholar
  15. Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2017. Homomorphic Encryption for Arithmetic of Approximate Numbers. In International Conference on the Theory and Application of Cryptology and Information Security.Google ScholarGoogle Scholar
  16. Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabachène. 2019. TFHE: Fast Fully Homomorphic Encryption Over the Torus. Journal of Cryptology 33 (04 2019). https://doi.org/10.1007/s00145-019-09319-xGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, and Kyogu Lee. 2018. Phase-aware speech enhancement with deep complex u-net. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  18. Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, and Sungjoo Ha. 2019. Temporal convolution for real-time keyword spotting on mobile devices. arXiv preprint arXiv:1904.03814 (2019).Google ScholarGoogle Scholar
  19. Alice Coucke, Mohammed Chlieh, Thibault Gisselbrecht, David Leroy, Mathieu Poumeyrol, and Thibaut Lavril. 2019. Efficient Keyword Spotting Using Dilated Convolutions and Gating. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6351--6355. https://doi.org/10.1109/ICASSP.2019.8683474Google ScholarGoogle Scholar
  20. Junfeng Fan and Frederik Vercauteren. 2012. Somewhat Practical Fully Homomorphic Encryption. IACR Cryptol. ePrint Arch. 2012 (2012), 144.Google ScholarGoogle Scholar
  21. Aravind Ganapathiraju, Jonathan Hamaker, and Joseph Picone. 2000. Hybrid SVM/HMM architectures for speech recognition. In Sixth international conference on spoken language processing.Google ScholarGoogle ScholarCross RefCross Ref
  22. Craig Gentry. 2009. Fully Homomorphic Encryption Using Ideal Lattices. Proceedings of the Annual ACM Symposium on Theory of Computing 9, 169--178. https://doi.org/10.1145/1536414.1536440Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin E. Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016 (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 201--210.Google ScholarGoogle Scholar
  24. Ehsan Hesamifard, Hassan Takabi, and Mehdi Ghasemi. 2017. CryptoDL: Deep Neural Networks over Encrypted Data. (2017).Google ScholarGoogle Scholar
  25. Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, and Lei Xie. 2020. DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv preprint arXiv:2008.00264 (2020).Google ScholarGoogle Scholar
  26. Takumi Ishiyama, Takuya Suzuki, and Hayato Yamana. 2020. Highly Accurate CNN Inference Using Approximate Activation Functions over Homomorphic Encryption. In 2020 IEEE International Conference on Big Data (Big Data). 3989--3995. https://doi.org/10.1109/BigData50022.2020.9378372Google ScholarGoogle ScholarCross RefCross Ref
  27. Xiaoqian Jiang, Miran Kim, Kristin E. Lauter, and Yongsoo Song. 2018. Secure Outsourced Matrix Computation and Application to Neural Networks. IACR Cryptol. ePrint Arch. 2018 (2018), 1041.Google ScholarGoogle Scholar
  28. Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. 2018. Gazelle: A Low Latency Framework for Secure Neural Network Inference. (01 2018).Google ScholarGoogle Scholar
  29. Lukasz Kaiser, Aidan N Gomez, and Francois Chollet. 2017. Depthwise separable convolutions for neural machine translation. arXiv preprint arXiv:1706.03059 (2017).Google ScholarGoogle Scholar
  30. Mohamed O. M. Khelifa, Yahya O. M. ElHadj, Abdellah Yousfi, and Mostafa Belkasmi. 2017. Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. Int. J. Speech Technol. 20, 4 (2017), 937--949. https://doi.org/10.1007/s10772-017-9456-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Kim. 2018. HEAAN. https://github.com/kimandrik/HEAANGoogle ScholarGoogle Scholar
  32. M. Kim, X. Jiang, K. Lauter, E. Ismayilzada, and S. Shams. 2021. HEAR: Human Action Recognition via Neural Networks on Homomorphically Encrypted Data. (2021).Google ScholarGoogle Scholar
  33. James Lin, Kevin Kilgour, Dominik Roblek, and Matthew Sharifi. 2020. Training Keyword Spotters with Limited and Synthesized Speech Data. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7474--7478. https://doi.org/10.1109/ICASSP40776.2020.9053193Google ScholarGoogle Scholar
  34. Vadim Lyubashevsky, Chris Peikert, and Oded Regev. 2010. On Ideal Lattices and Learning with Errors over Rings. In Advances in Cryptology -- EUROCRYPT 2010, Henri Gilbert (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Masato Mimura, Sei Ueno, Hirofumi Inaguma, Shinsuke Sakai, and Tatsuya Kawahara. 2018. Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition. In 2018 IEEE Spoken Language Technology Workshop (SLT). 477--484. https://doi.org/10.1109/SLT.2018.8639589Google ScholarGoogle Scholar
  36. Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, and Gerhard Rigoll. 2020. Small-Footprint Keyword Spotting on Raw Audio Data with Sinc- Convolutions. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7454--7458. https://doi.org/10.1109/ICASSP40776.2020.9053395Google ScholarGoogle Scholar
  37. P. Paillier. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Advances in Cryptology - EUROCRYPT '99, International Conference on the Theory and Application of Cryptographic Techniques.Google ScholarGoogle Scholar
  38. M. Sadegh Riazi, Mohammad Samragh, Hao Chen, Kim Laine, Kristin Lauter, and Farinaz Koushanfar. 2019. XONN: XNOR-based Oblivious Deep Neural Network Inference. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 1501--1518. https://www.usenix.org/conference/usenixsecurity19/presentation/riaziGoogle ScholarGoogle Scholar
  39. R. L. Rivest, A. Shamir, and L. Adleman. 1977. On Digital Signatures and Public-Key Cryptosystems. (1977).Google ScholarGoogle Scholar
  40. Jan Stadermann and Gerhard Rigoll. 2004. A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition. In Proc. Int. Conf. on Spoken Language Processing ICSLP# 2004, Jeju Island, South Korea.Google ScholarGoogle ScholarCross RefCross Ref
  41. Raphael Tang and Jimmy Lin. 2018. Deep Residual Learning for Small-Footprint Keyword Spotting. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5484--5488. https://doi.org/10.1109/ICASSP.2018.8462688Google ScholarGoogle Scholar
  42. Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, Joo Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J Pal. 2017. Deep Complex Networks. (2017).Google ScholarGoogle Scholar
  43. Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J. Pal. 2018. Deep Complex Networks. In 6th International Conference on Learning Representations, ICLR 2018,.Google ScholarGoogle Scholar
  44. Pete Warden. 2017. Speech Commands: A public dataset for single-word speech recognition. Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz (2017).Google ScholarGoogle Scholar
  45. Minz Won, Sanghyuk Chun, Oriol Nieto, and Xavier Serrc. 2020. Data-Driven Harmonic Filters for Audio Representation Learning. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 536--540. https://doi.org/10.1109/ICASSP40776.2020.9053669Google ScholarGoogle Scholar
  46. Pengtao Xie, Misha Bilenko, Tom Finley, Ran Gilad-Bachrach, Kristin E. Lauter, and Michael Naehrig. 2014. Crypto-Nets: Neural Networks over Encrypted Data. CoRR abs/1412.6181 (2014). arXiv:1412.6181Google ScholarGoogle Scholar
  47. Andrew Chi-Chih Yao. 1986. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986). IEEE, 162--167.Google ScholarGoogle Scholar
  48. Shi-Xiong Zhang, Yifan Gong, and Dong Yu. 2019. Encrypted Speech Recognition Using Deep Polynomial Networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019. IEEE, 5691--5695. https://doi.org/10.1109/ICASSP.2019.8683721Google ScholarGoogle Scholar
  49. Peijia Zheng and Jiwu Huang. 2013. Discrete wavelet transform and data expansion reduction in homomorphic encrypted domain. IEEE Transactions on Image Processing 22, 6 (2013), 2455--2468.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Peijia Zheng and Jiwu Huang. 2018. Efficient encrypted images filtering and transform coding with walsh-hadamard transform and parallelization. IEEE Transactions on Image Processing 27, 5 (2018), 2541--2556.Google ScholarGoogle ScholarCross RefCross Ref
  51. Yimeng Zhuang, Xuankai Chang, Yanmin Qian, and Kai Yu. 2016. Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC. 938--942. https://doi.org/10. 21437/Interspeech.2016--753Google ScholarGoogle Scholar

Index Terms

  1. Keyword Spotting in the Homomorphic Encrypted Domain Using Deep Complex-Valued CNN

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '22: Proceedings of the 30th ACM International Conference on Multimedia
          October 2022
          7537 pages
          ISBN:9781450392037
          DOI:10.1145/3503161

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 October 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia
        • Article Metrics

          • Downloads (Last 12 months)106
          • Downloads (Last 6 weeks)5

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader