A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing

Zhang, Qiu-yu; Zhou, Liang; Zhang, Tao; Zhang, Deng-hai

doi:10.1007/s11042-019-7180-9

A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing

Published: 15 January 2019

Volume 78, pages 17825–17846, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qiu-yu Zhang ORCID: orcid.org/0000-0003-1488-388X¹,
Liang Zhou¹,
Tao Zhang¹ &
…
Deng-hai Zhang¹

1441 Accesses
21 Citations
3 Altmetric
Explore all metrics

Abstract

In order to achieve extraction perceptual features from the encryption speech as a search digest for the content-based encryption speech retrieval, we present a retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing in this paper. Firstly, the study encrypts the speech file and uploads the encrypted speech data to the encryption speech database in cloud server. Secondly, the sample speech clips are obtained by the cutting operation from the speech file for scrambling encryption. The perceptual hashing sequence of the encrypted speech is constructed by extracting the short-term cross-correlation of the encrypted speech signals as the search digest. These perceptual hashing sequences are uploaded into the hashing index table of cloud server. Finally, the Hamming distance algorithm is used for the matching retrieval operation during the search. The experimental results show that the proposed algorithm of encrypted speech perceptual hashing has a better discrimination, robustness and compactness, and the perceptual hashing sequences can be extracted directly from the encrypted sample speech. Meanwhile, the encryption speech signal has high recall and precision ratios after various content preserving operations. In the whole retrieval process, the downloading and decrypting operations of speech data are not necessary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Analyzing Multilingual Automatic Speech Recognition Systems Performance

References

Alías F, Socoró JC, Sevillano X (2016) A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Appl Sci 6(5):143–167. https://doi.org/10.3390/app6050143
Article Google Scholar
de Carvalho CAB, De Castro MF, de Castro Andrade RM (2017) Secure cloud storage service for detection of security violations. In Cluster, Cloud and Grid Computing (CCGRID), 17th IEEE/ACM International Symposium. IEEE 715–718. https://doi.org/10.1109/CCGRID.2017.19
Chen D, Zhang W, Zhang Z, Huang W, Ao J (2017) Audio retrieval based on wavelet transform. In Computer and Information Science (ICIS), 2017 IEEE/ACIS 16th International Conference on. IEEE 531–534. https://doi.org/10.1109/ICIS.2017.7960049
Ding D, Metze F, Rawat S, Schulam PF, Burger S, Younessian E, Bao L, Christel MG, Hauptmann A (2012) Beyond audio and video retrieval: towards multimedia summarization. In International Conference on Multimedia Retrieval (ICMR), 2nd ACM International Conference on Multimedia Retrieval. ACM 2:1-2:8. https://doi.org/10.1145/2324796.2324799
Glackin C, Chollet G, Dugan N, Cannings N, Wall J, Tahir S, Ray I G, Rajarajan M (2017) Privacy preserving encrypted phonetic search of speech data. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 6414–6418. https://doi.org/10.1109/ICASSP.2017.7953391
Habib Z, Khan JS, Ahmad J, Khan MA, Khan FA (2017) Secure speech communication algorithm via DCT and TD-ERCS chaotic map. In Electrical and Electronic Engineering (ICEEE), 2017 4th International Conference on. IEEE 246–250. https://doi.org/10.1109/ICEEE2.2017.7935827
He SF, Zhao H (2017) A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing. Comput Sci Inf Syst 14(3):703–718. https://doi.org/10.2298/CSIS170112024H
Article Google Scholar
Hu P, Liu W, Jiang W, Yang Z (2014) Latent topic model for audio retrieval. Pattern Recogn 47(3):1138–1143. https://doi.org/10.1016/j.patcog.2013.06.010
Article Google Scholar
Kalker T, Haitsma J, Oostveen JC (2001) Issues with digital watermarking and perceptual hashing. In Multimedia Systems and Applications IV. Int Soc Opt Photon 189–198. https://doi.org/10.1117/12.448203
Lv X, He F, Cai W, Cheng Y (2018) Supporting selective undo of string-wise operations for collaborative editing systems. Futur Gener Comput Syst 82:41–62. https://doi.org/10.1016/j.future.2017.11.046
Article Google Scholar
Mäkinen T, Kiranyaz S, Raitoharju J, Gabbouj M (2012) An evolutionary feature synthesis approach for content-based audio retrieval. EURASIP J Audio Speech Music Process 2012(1):1–23. https://doi.org/10.1186/1687-4722-2012-23
Article Google Scholar
Mitani K, Sugiura Y, Shimamura T (2016) Cross-correlation functions with binary signal involving phase information for speech enhancement. In Intelligent Signal Processing and Communication Systems (ISPACS), 2016 International Symposium on. IEEE 1–5. https://doi.org/10.1109/ISPACS.2016.7824729
Roy A, Misra AP (2017) Audio signal encryption using chaotic Hénon map and lifting wavelet transforms. Eur Phys J Plus 132(12):524–533. https://doi.org/10.1140/epjp/i2017-11808-x
Article Google Scholar
Sadr A, Okhovat RS (2015) Security in the speech cryptosystem based on blind sources separation. Multimed Tools Appl 74(21):9715–9728. https://doi.org/10.1007/s11042-014-2147-3
Article Google Scholar
Song J, Gao L, Nie F, Shen HT, Yan Y, Sebe N (2016) Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Trans Image Process 25(11):4999–5011. https://doi.org/10.1109/TIP.2016.2601260
Article MathSciNet MATH Google Scholar
Song J, Zhang H, Li X, Gao L, Wang M, Hong R (2018) Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans Image Process 27(7):3210–3221. https://doi.org/10.1109/TIP.2018.2814344
Article MathSciNet MATH Google Scholar
Song J, Gao L, Liu L, Zhu X, Sebe N (2018) Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recogn 75:175–187. https://doi.org/10.1016/j.patcog.2017.03.021
Article Google Scholar
Song J, Guo Y, Gao L, Gao L, Li X, Hanjalic A, Shen HT (2018) From deterministic to generative: multimodal stochastic RNNs for video captioning. IEEE Trans Neural Netw Learn Syst 99:1–12. https://doi.org/10.1109/TNNLS.2018.2851077
Google Scholar
Tahir S, Rajarajan M, Sajjad A (2017) A ranked searchable encryption scheme for encrypted data hosted on the public cloud, In International Conference on Information Networking (ICOIN), 2017 International Conference on. IEEE 242–247. https://doi.org/10.1109/ICOIN.2017.7899512
Thangavel M, Varalakshmi P, Renganayaki S, Subhapriya GR, Preethi T, Banu AZ (2016) SMCSRC—Secure multimedia content storage and retrieval in cloud. In International Conference on Recent Trends in Information Technology (ICRTIT), 2016 International Conference on. IEEE 1–6. https://doi.org/10.1109/ICRTIT.2016.7569581
Wang HX, Hao GY (2015) Encryption speech perceptual hashing algorithm and retrieval scheme based on time and frequency domain change characteristics. China patent, CN104835499A, 2015-08-12
Wang H, Zhou L, Zhang W, Liu S (2013) Watermarking-based perceptual hashing search over encrypted speech. In International Workshop on Digital Watermarking. Springer Berlin Heidelberg 423–434. https://doi.org/10.1007/978-3-662-43886-2_3
Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process Lett 24(4):510–514. https://doi.org/10.1109/LSP.2016.2611485
Article Google Scholar
Wang X, Gao L, Wang P, Sun X, Liu X (2018) Two-stream 3-D convNet fusion for action recognition in videos with arbitrary size and length. IEEE Trans Multimedia 20(3):634–644. https://doi.org/10.1109/TMM.2017.2749159
Article Google Scholar
Wu Y, He F, Zhang D, Li X (2018) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput 11(2):341–353. https://doi.org/10.1109/TSC.2015.2501981
Article Google Scholar
Xia Z, Wang X, Zhang L, Qin Z, Sun X, Ren K (2016) A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing. IEEE Trans Inf Forensics Secur 11(11):2594–2608. https://doi.org/10.1109/TIFS.2016.2590944
Article Google Scholar
Xia Z, Zhu Y, Sun X, Qin Z, Ren K (2018) Towards privacy-preserving content-based image retrieval in cloud computing. IEEE Trans Cloud Comput 6(1):276–286. https://doi.org/10.1109/TCC.2015.2491933
Article Google Scholar
Xu Y, Huang Q, Wang W, Foster P, Sigtia S, Jackson PJ, Plumbley MD (2017) Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Trans Audio Speech Lang Process 25(6):1230–1241. https://doi.org/10.1109/TASLP.2017.2690563
Article Google Scholar
Zhang QY, Qiao SB, Huang YB, Zhang T (2018) A high-performance speech perceptual hashing authentication algorithm based on discrete wavelet transform and measurement matrix. Multimed Tools Appl 77(16):21653–21669. https://doi.org/10.1007/s11042-018-5613-5
Article Google Scholar
Zhao H, He SF (2016) A retrieval algorithm for encrypted speech based on perceptual hashing. In Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016 12th International Conference on. IEEE 1840–1845. https://doi.org/10.1109/FSKD.2016.7603458
Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph PCA hashing for similarity search. IEEE Trans Multimed 19(9):2033–2044. https://doi.org/10.1109/TMM.2017.2703636
Article Google Scholar
Zou F, Tang X, Li K, Wang Y, Song J, Yang S, Ling H (2018) Hidden semantic hashing for fast retrieval over large scale document collection. Multimed Tools Appl 77(3):3677–3697. https://doi.org/10.1007/s11042-017-5219-3
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61862041, No. 61363078), the Research Project in Universities of Education Department of Gansu Province (2017B-16, 2018A-187). The authors would like to thank the anonymous reviewers for their helpful comments and suggestions.

Author information

Authors and Affiliations

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China
Qiu-yu Zhang, Liang Zhou, Tao Zhang & Deng-hai Zhang

Authors

Qiu-yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Deng-hai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiu-yu Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Qy., Zhou, L., Zhang, T. et al. A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing. Multimed Tools Appl 78, 17825–17846 (2019). https://doi.org/10.1007/s11042-019-7180-9

Download citation

Received: 01 July 2018
Revised: 28 December 2018
Accepted: 06 January 2019
Published: 15 January 2019
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s11042-019-7180-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing

Abstract

Access this article

Similar content being viewed by others

A Deep Learning Framework for Audio Deepfake Detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Analyzing Multilingual Automatic Speech Recognition Systems Performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing

Abstract

Access this article

Similar content being viewed by others

A Deep Learning Framework for Audio Deepfake Detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Analyzing Multilingual Automatic Speech Recognition Systems Performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation