Skip to main content
Log in

A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a retrieval algorithm for encrypted speech based on the convolution neural network (CNN) and deep hashing. It is used to overcome the feature extraction defects of the existing content-based encrypted speech retrieval methods, and solve the problem of low retrieval accuracy caused by high dimensional and temporality of audio data. Firstly, the study encrypts the original speech by the three-dimensional chaotic encryption algorithm and uploads it to the encryption speech library in the cloud. Since CNN can well capture the basic semantic structure features of speech data, we use CNN as a feature extractor to extract deep features from Log-Mel spectrogram/MFCC. The batch normalization algorithm is introduced in the training process, which improves the speed of network fitting, reduces the training time, and improves the retrieval efficiency of the system. Secondly, the deep features extracted from CNN are combined with the hash function to construct the system hashing index table. Finally, the retrieval is implemented by the normalized Hamming distance algorithm. The experimental results show that the proposed algorithm has better discrimination, robustness to amplitude change compared with the existing methods. Meanwhile, the proposed algorithm has a high recall, precision, and retrieval efficiency after various content preserving operations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Alamodi AOA, Sun K, Ai W, Chen C, Peng D (2019) Design new chaotic maps based on dimension expansion. Chinese physics B 28(2): 020503. CNKI:SUN:ZGWL.0.2019-02-016

  2. Cummins N, Amiriparian S, Hagerer G, Batliner A, Steidl S, Schuller BW (2017) An image-based deep spectrum feature representation for the recognition of emotional speech. In International Conference on Multimedia, 25th ACM international conference on. ACM, 2017: 478–484. https://doi.org/10.1145/3123266.3123371

  3. De Santana LMQ, Santos RM, Matos LN, Macedo HT (2018) Deep neural networks for acoustic modeling in the presence of noise. IEEE Lat Am Trans 16(3):918–925. https://doi.org/10.1109/TLA.2018.8358674

    Article  Google Scholar 

  4. Dhiraj BR, Ghattamaraju N (2018) An effective analysis of deep learning based approaches for audio based feature extraction and its visualization. Multimedia Tools and Applications 1–24. https://doi.org/10.1007/s11042-018-6706-x

  5. Elizalde B, Zarar S, Raj B (2019) Cross modal audio search and retrieval with joint embeddings based on text and audio. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019-2019 IEEE International Conference on. IEEE 4095–4099. https://doi.org/10.1109/ICASSP.2019.8682632

  6. Gupta BB, Yamaguchi S, Agrawal DP (2018) Advances in security and privacy of multimedia big data in mobile and cloud computing. Multimed Tools Appl 77(7):9203–9208. https://doi.org/10.1007/s11042-017-5301-x

    Article  Google Scholar 

  7. He SF, Zhao H (2017) A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing. Comput Sci Inf Syst 14(3):703–718. https://doi.org/10.2298/CSIS170112024H

    Article  Google Scholar 

  8. Hertel L, Barth E, Käster T, Martinetz T (2015) Deep convolutional neural networks as generic feature extractors. In International Joint Conference on Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE 1–4. https://doi.org/10.1109/IJCNN.2015.7280683

  9. Hertel L, Phan H, Mertins A (2016) Comparing time and frequency domain for audio event recognition using deep learning. In International Joint Conference on Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE 3407–3411. https://doi.org/10.1109/IJCNN.2016.7727635

  10. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, the 32nd International Conference on International Conference on Machine Learning. 37(448-456)

  11. Juvela L, Bollepalli B, Wang X, Kameoka H, Airaksinen M, Yamagishi J, Alku P (2018) Speech waveform synthesis from MFCC sequences with generative adversarial networks. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE 5679-5683. https://doi.org/10.1109/ICASSP.2018.8461852

  12. Keras: The Python Deep Learning library. https://github.com/keras-team/keras/tree/master/docs. Accessed 14 Oct 2019

  13. Li Y, Xu Y, Miao Z, Li H, Wang J, Zhang Y (2016) Deep feature hash codes framework for content-based image retrieval. In 2016 8th international conference on Wireless Communications & Signal Processing (WCSP). IEEE 1–6. https://doi.org/10.1109/WCSP.2016.7752525

  14. Lin K, Yang HF, Hsiao JH, Chen CH (2015) Deep learning of binary hash codes for fast image retrieval. In Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), the IEEE Conference on. IEEE 27–35. https://doi.org/10.1109/CVPRW.2015.7301269

  15. Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In conference on computer vision and pattern recognition, the IEEE conference on. IEEE 2064–2072. https://doi.org/10.1109/CVPR.2016.227

  16. McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In Proceedings of the 14th python in science conference (SCIPY 2015). 8: 18-24. https://doi.org/10.25080/Majora-7b98e3ed-003

  17. Nayyar RK, Nair S, Patil O, Pawar R, Lolage A (2017) Content-based auto-tagging of audios using deep learning. In International Conference on Big Data, IoT and Data Science, 2017 International Conference on. IEEE 30–36. https://doi.org/10.1109/BID.2017.8336569

  18. Pons J, Serra X (2019) Randomly weighted CNNs for (music) audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019-2019 IEEE International Conference on. IEEE 336–340. https://doi.org/10.1109/ICASSP.2019.8682912

  19. Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 24(3):279–283. https://doi.org/10.1109/LSP.2017.2657381

    Article  Google Scholar 

  20. Shen F, Shen C, Liu W, Tao SH (2015) Supervised discrete hashing. In proceedings of the IEEE conference on computer vision and pattern recognition. IEEE 37–45. https://doi.org/10.1109/CVPR.2015.7298598

  21. Spring R, Shrivastava A (2017) Scalable and sustainable deep learning via randomized hashing. In International Conference on Knowledge Discovery and Data Mining, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 445–454. https://doi.org/10.1145/3097983.3098035

  22. Sun C, Yang Y, Wen C, Xie K, Wen F (2018) Voiceprint identification for limited dataset using the deep migration hybrid model based on transfer learning. Sensors 18(7):2399. https://doi.org/10.3390/s18072399

    Article  Google Scholar 

  23. Thangavel M, Varalakshmi P, Renganayaki S, Subhapriya GR, Preethi T, Banu AZ (2016) SMCSRC—secure multimedia content storage and retrieval in cloud. In 2016 international conference on recent trends in information technology (ICRTIT). IEEE 1–6. https://doi.org/10.1109/ICRTIT.2016.7569581

  24. Valenti M, Squartini S, Diment A, Parascandolo G, Virtanen T (2017) A convolutional neural network approach for acoustic scene classification. In International Joint Conference on Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE 1547–1554. https://doi.org/10.1109/IJCNN.2017.7966035

  25. Wang HX, Hao GY (2015) Encryption speech perceptual hashing algorithm and retrieval scheme based on time and frequency domain change characteristics. China patent, CN104835499A, 2015-08-12

  26. Wang D, Zhang XW (2015) Thchs-30: a free Chinese speech corpus. arXiv preprint arXiv:1512.01882

  27. Wang H, Zhou L, Zhang W, Liu S (2013) Watermarking-based perceptual hashing search over encrypted speech. In International Workshop on Digital Watermarking. Springer Berlin Heidelberg 423–434. https://doi.org/10.1007/978-3-662-43886-2_3

  28. Wu Y, Lee T (2018) Reducing model complexity for DNN based large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE 331–335. https://doi.org/10.1109/ICASSP.2018.8462168

  29. Wu JF, Qin HB, Hua YZ, Fan LY (2018) Pitch estimation and voicing classification using reconstructed spectrum from MFCC. IEICE Trans Inf Syst 101(2):556–559. https://doi.org/10.1587/transinf.2017EDL8162

    Article  Google Scholar 

  30. Xu Y, Kong Q, Wang W, Plumbley MD (2018) Large-scale weakly supervised audio classification using gated convolutional neural network. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE, 121–125. https://doi.org/10.1109/ICASSP.2018.8461975

  31. Zhang Q, Zhou L, Zhang T, Zhang D (2019) A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing, Multimedia Tools and Applications 1–22. https://doi.org/10.1007/s11042-019-7180-9

  32. Zhao H, He SF (2016) A retrieval algorithm for encrypted speech based on perceptual hashing. In 2016 12th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE 1840–1845. https://doi.org/10.1109/FSKD.2016.7603458

  33. Zhao S, Zhang Y, Xu H, Han T (2019) Ensemble classification based on feature selection for environmental sound recognition. Mathematical Problems in Engineering 1–7. https://doi.org/10.1155/2019/4318463

  34. Zheng W, Mo Z, Xing X, Zhao G (2018) CNNs-based acoustic scene classification using multi-spectrogram fusion and label expansions. arXiv preprint arXiv:1809.01543 1-7.

  35. Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16). AAAI 2415-2421.

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61862041, 61363078). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiu-yu Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Qy., Li, Yz. & Hu, Yj. A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing. Multimed Tools Appl 80, 1201–1221 (2021). https://doi.org/10.1007/s11042-020-09748-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09748-y

Keywords

Navigation