A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing

Zhang, Qiu-yu; Li, Yu-zhou; Hu, Ying-jie

doi:10.1007/s11042-020-09748-y

A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing

Published: 07 September 2020

Volume 80, pages 1201–1221, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

603 Accesses
11 Citations
Explore all metrics

Abstract

In this paper, we propose a retrieval algorithm for encrypted speech based on the convolution neural network (CNN) and deep hashing. It is used to overcome the feature extraction defects of the existing content-based encrypted speech retrieval methods, and solve the problem of low retrieval accuracy caused by high dimensional and temporality of audio data. Firstly, the study encrypts the original speech by the three-dimensional chaotic encryption algorithm and uploads it to the encryption speech library in the cloud. Since CNN can well capture the basic semantic structure features of speech data, we use CNN as a feature extractor to extract deep features from Log-Mel spectrogram/MFCC. The batch normalization algorithm is introduced in the training process, which improves the speed of network fitting, reduces the training time, and improves the retrieval efficiency of the system. Secondly, the deep features extracted from CNN are combined with the hash function to construct the system hashing index table. Finally, the retrieval is implemented by the normalized Hamming distance algorithm. The experimental results show that the proposed algorithm has better discrimination, robustness to amplitude change compared with the existing methods. Meanwhile, the proposed algorithm has a high recall, precision, and retrieval efficiency after various content preserving operations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Content-based encrypted speech retrieval scheme with deep hashing

Article 14 February 2022

Secure speech retrieval method using deep hashing and CKKS fully homomorphic encryption

Article 25 January 2024

A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing

Article 27 February 2022

References

Alamodi AOA, Sun K, Ai W, Chen C, Peng D (2019) Design new chaotic maps based on dimension expansion. Chinese physics B 28(2): 020503. CNKI:SUN:ZGWL.0.2019-02-016
Cummins N, Amiriparian S, Hagerer G, Batliner A, Steidl S, Schuller BW (2017) An image-based deep spectrum feature representation for the recognition of emotional speech. In International Conference on Multimedia, 25th ACM international conference on. ACM, 2017: 478–484. https://doi.org/10.1145/3123266.3123371
De Santana LMQ, Santos RM, Matos LN, Macedo HT (2018) Deep neural networks for acoustic modeling in the presence of noise. IEEE Lat Am Trans 16(3):918–925. https://doi.org/10.1109/TLA.2018.8358674
Article Google Scholar
Dhiraj BR, Ghattamaraju N (2018) An effective analysis of deep learning based approaches for audio based feature extraction and its visualization. Multimedia Tools and Applications 1–24. https://doi.org/10.1007/s11042-018-6706-x
Elizalde B, Zarar S, Raj B (2019) Cross modal audio search and retrieval with joint embeddings based on text and audio. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019-2019 IEEE International Conference on. IEEE 4095–4099. https://doi.org/10.1109/ICASSP.2019.8682632
Gupta BB, Yamaguchi S, Agrawal DP (2018) Advances in security and privacy of multimedia big data in mobile and cloud computing. Multimed Tools Appl 77(7):9203–9208. https://doi.org/10.1007/s11042-017-5301-x
Article Google Scholar
He SF, Zhao H (2017) A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing. Comput Sci Inf Syst 14(3):703–718. https://doi.org/10.2298/CSIS170112024H
Article Google Scholar
Hertel L, Barth E, Käster T, Martinetz T (2015) Deep convolutional neural networks as generic feature extractors. In International Joint Conference on Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE 1–4. https://doi.org/10.1109/IJCNN.2015.7280683
Hertel L, Phan H, Mertins A (2016) Comparing time and frequency domain for audio event recognition using deep learning. In International Joint Conference on Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE 3407–3411. https://doi.org/10.1109/IJCNN.2016.7727635
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, the 32nd International Conference on International Conference on Machine Learning. 37(448-456)
Juvela L, Bollepalli B, Wang X, Kameoka H, Airaksinen M, Yamagishi J, Alku P (2018) Speech waveform synthesis from MFCC sequences with generative adversarial networks. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE 5679-5683. https://doi.org/10.1109/ICASSP.2018.8461852
Keras: The Python Deep Learning library. https://github.com/keras-team/keras/tree/master/docs. Accessed 14 Oct 2019
Li Y, Xu Y, Miao Z, Li H, Wang J, Zhang Y (2016) Deep feature hash codes framework for content-based image retrieval. In 2016 8th international conference on Wireless Communications & Signal Processing (WCSP). IEEE 1–6. https://doi.org/10.1109/WCSP.2016.7752525
Lin K, Yang HF, Hsiao JH, Chen CH (2015) Deep learning of binary hash codes for fast image retrieval. In Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), the IEEE Conference on. IEEE 27–35. https://doi.org/10.1109/CVPRW.2015.7301269
Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In conference on computer vision and pattern recognition, the IEEE conference on. IEEE 2064–2072. https://doi.org/10.1109/CVPR.2016.227
McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In Proceedings of the 14th python in science conference (SCIPY 2015). 8: 18-24. https://doi.org/10.25080/Majora-7b98e3ed-003
Nayyar RK, Nair S, Patil O, Pawar R, Lolage A (2017) Content-based auto-tagging of audios using deep learning. In International Conference on Big Data, IoT and Data Science, 2017 International Conference on. IEEE 30–36. https://doi.org/10.1109/BID.2017.8336569
Pons J, Serra X (2019) Randomly weighted CNNs for (music) audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019-2019 IEEE International Conference on. IEEE 336–340. https://doi.org/10.1109/ICASSP.2019.8682912
Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 24(3):279–283. https://doi.org/10.1109/LSP.2017.2657381
Article Google Scholar
Shen F, Shen C, Liu W, Tao SH (2015) Supervised discrete hashing. In proceedings of the IEEE conference on computer vision and pattern recognition. IEEE 37–45. https://doi.org/10.1109/CVPR.2015.7298598
Spring R, Shrivastava A (2017) Scalable and sustainable deep learning via randomized hashing. In International Conference on Knowledge Discovery and Data Mining, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 445–454. https://doi.org/10.1145/3097983.3098035
Sun C, Yang Y, Wen C, Xie K, Wen F (2018) Voiceprint identification for limited dataset using the deep migration hybrid model based on transfer learning. Sensors 18(7):2399. https://doi.org/10.3390/s18072399
Article Google Scholar
Thangavel M, Varalakshmi P, Renganayaki S, Subhapriya GR, Preethi T, Banu AZ (2016) SMCSRC—secure multimedia content storage and retrieval in cloud. In 2016 international conference on recent trends in information technology (ICRTIT). IEEE 1–6. https://doi.org/10.1109/ICRTIT.2016.7569581
Valenti M, Squartini S, Diment A, Parascandolo G, Virtanen T (2017) A convolutional neural network approach for acoustic scene classification. In International Joint Conference on Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE 1547–1554. https://doi.org/10.1109/IJCNN.2017.7966035
Wang HX, Hao GY (2015) Encryption speech perceptual hashing algorithm and retrieval scheme based on time and frequency domain change characteristics. China patent, CN104835499A, 2015-08-12
Wang D, Zhang XW (2015) Thchs-30: a free Chinese speech corpus. arXiv preprint arXiv:1512.01882
Wang H, Zhou L, Zhang W, Liu S (2013) Watermarking-based perceptual hashing search over encrypted speech. In International Workshop on Digital Watermarking. Springer Berlin Heidelberg 423–434. https://doi.org/10.1007/978-3-662-43886-2_3
Wu Y, Lee T (2018) Reducing model complexity for DNN based large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE 331–335. https://doi.org/10.1109/ICASSP.2018.8462168
Wu JF, Qin HB, Hua YZ, Fan LY (2018) Pitch estimation and voicing classification using reconstructed spectrum from MFCC. IEICE Trans Inf Syst 101(2):556–559. https://doi.org/10.1587/transinf.2017EDL8162
Article Google Scholar
Xu Y, Kong Q, Wang W, Plumbley MD (2018) Large-scale weakly supervised audio classification using gated convolutional neural network. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE, 121–125. https://doi.org/10.1109/ICASSP.2018.8461975
Zhang Q, Zhou L, Zhang T, Zhang D (2019) A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing, Multimedia Tools and Applications 1–22. https://doi.org/10.1007/s11042-019-7180-9
Zhao H, He SF (2016) A retrieval algorithm for encrypted speech based on perceptual hashing. In 2016 12th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE 1840–1845. https://doi.org/10.1109/FSKD.2016.7603458
Zhao S, Zhang Y, Xu H, Han T (2019) Ensemble classification based on feature selection for environmental sound recognition. Mathematical Problems in Engineering 1–7. https://doi.org/10.1155/2019/4318463
Zheng W, Mo Z, Xing X, Zhao G (2018) CNNs-based acoustic scene classification using multi-spectrogram fusion and label expansions. arXiv preprint arXiv:1809.01543 1-7.
Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16). AAAI 2415-2421.

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61862041, 61363078). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China
Qiu-yu Zhang, Yu-zhou Li & Ying-jie Hu

Authors

Qiu-yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-zhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Ying-jie Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiu-yu Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Qy., Li, Yz. & Hu, Yj. A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing. Multimed Tools Appl 80, 1201–1221 (2021). https://doi.org/10.1007/s11042-020-09748-y

Download citation

Received: 08 July 2019
Revised: 17 July 2020
Accepted: 27 August 2020
Published: 07 September 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11042-020-09748-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing

Abstract

Access this article

Similar content being viewed by others

Content-based encrypted speech retrieval scheme with deep hashing

Secure speech retrieval method using deep hashing and CKKS fully homomorphic encryption

A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing

Abstract

Access this article

Similar content being viewed by others

Content-based encrypted speech retrieval scheme with deep hashing

Secure speech retrieval method using deep hashing and CKKS fully homomorphic encryption

A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation