Abstract
In the field of security surveillance, remotely acquire the sound signal of the target is an attractive research topic. The research has broad application prospects, such as counter-terrorism, rescue, medical monitoring, and so on. To obtain clear and accurate sound signal of the target, we propose a method based on convolutional LSTM network to recover the sound. The principle of our method consists of two steps. First, we record the speckle images of target remotely. Then we utilize the convolutional LSTM network to extract the subtle movement from speckle images. The results demonstrate that our network is superior to convolutional neural network in the accuracy and efficiency of processing temporal-spatial speckle image data. The influence of different sampling rates on sound extraction is revealed through appropriate experimental settings. In addition, we also reveal the principle that our network has stronger generalization ability than convolutional neural network. Benefit from the powerful generalization ability of the network, our method could perform accurate and robust sound extraction to unseen objects. The excellent performance of our method proves that it is a significant development in the field of remote sound acquisition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016). http://arxiv.org/org/abs/1607.06450v1
Barcellona, C., et al.: Remote recovery of audio signals from videos of optical speckle patterns: a comparative study of signal recovery algorithms. Opt. Express 28(6), 8716–8723 (2020). https://doi.org/10.1364/OE.386406
Billa, J.: Dropout approaches for LSTM based speech recognition systems. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5879–5883 (2018). https://doi.org/10.1109/ICASSP.2018.8462544
Blaber, J., Adair, B., Antoniou, A.: Ncorr: open-source 2D digital image correlation matlab software. Experiment. Mech. 55, 1105–1122 (2015)
Castellini, P., Martarelli, M., Tomasini, E.: Laser doppler vibrometry: development of advanced solutions answering to technology’s needs. Mech. Syst. Signal Process. 20(6), 1265–1285 (2006). https://doi.org/10.1016/j.ymssp.2005.11.015
Davis, A., Rubinstein, M., Wadhwa, N., Mysore, G.J., Durand, F., Freeman, W.T.: The visual microphone: passive recovery of sound from video. ACM Trans. Graph. 33(4) (2014)
Diamond, D.H., Heyns, P.S., Oberholster, A.J.: Accuracy evaluation of sub-pixel structural vibration measurements through optical flow analysis of a video sequence. Measurement 95, 166–172 (2017)
Garg, P., et al.: Measuring transverse displacements using unmanned aerial systems laser doppler vibrometer (UAS-LDV): development and field validation. Sensors 20(21) (2020). https://doi.org/10.3390/s20216051
Graves, A.: Generating sequences with recurrent neural networks. ArXiv abs/1308.0850 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Kritsis, K., Kaliakatsos-Papakostas, M., Katsouros, V., Pikrakis, A.: Deep convolutional and lstm neural network architectures on leap motion hand tracking data sequences. In: 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5 (2019). 10.23919/EUSIPCO.2019.8902973
Mutegeki, R., Han, D.S.: A cnn-lstm approach to human activity recognition. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 362–366 (2020). https://doi.org/10.1109/ICAIIC48513.2020.9065078
Ozana, N., et al.: Demonstration of a remote optical measurement configuration that correlates with breathing, heart rate, pulse pressure, blood coagulation, and blood oxygenation. Proc. IEEE 103(2), 248–262 (2015). https://doi.org/10.1109/JPROC.2014.2385793
Pasunuru, R., Bansal, M.: Multi-task video captioning with video and entailment generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1273–1283 (Jul 2017)
Peters, W.H., Ranson, W.F.: Digital imaging techniques in experimental stress analysis. Optical Eng. 21(3), 427–431 (1982). https://doi.org/10.1117/12.7972925
Rothberg, S., et al.: An international review of laser doppler vibrometry: making light work of vibration measurement. Optics Lasers Eng. 99, 11–22 (2017). https://doi.org/10.1016/j.optlaseng.2016.10.023
Shao, X., Zhong, F., Huang, W., Dai, X., Chen, Z., He, X.: Digital image correlation with improved efficiency by pixel selection. Appl. Opt. 59(11), 3389–3398 (2020). https://doi.org/10.1364/AO.387678
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.k., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, p. 802–810. NIPS 2015 (2015)
Smith, B.M., O’Toole, M., Gupta, M.: Tracking multiple objects outside the line of sight using speckle imaging. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6258–6266 (2018). https://doi.org/10.1109/CVPR.2018.00655
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - vol. 37, pp. 843–852. ICML 2015 (2015)
Xu, Z., Li, S., Deng, W.: Learning temporal features using lstm-cnn architecture for face anti-spoofing. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). pp. 141–145 (2015). https://doi.org/10.1109/ACPR.2015.7486482
Yang, D., Su, Z., Zhang, S., Zhang, D.: Real-time matching strategy for rotary objects using digital image correlation. Appl. Opt. 59(22), 6648–6657 (2020). https://doi.org/10.1364/AO.397655
Zalevsky, Z., et al.: Simultaneous remote extraction of multiple speech sources and heart beats from secondary speckles pattern. Opt. Express 17(24), 21566–21580 (2009). https://doi.org/10.1364/OE.17.021566
Zhu, D., Yang, L., Li, Z., Zeng, H.: Remote speech extraction from speckle image by convolutional neural network. In: 2020 IEEE Symposium on Computers and Communications (ISCC), pp. 1–6 (2020). https://doi.org/10.1109/ISCC50000.2020.9219652
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, D., Yang, L., Zeng, H. (2021). Remote Recovery of Sound from Speckle Pattern Video Based on Convolutional LSTM. In: Gao, D., Li, Q., Guan, X., Liao, X. (eds) Information and Communications Security. ICICS 2021. Lecture Notes in Computer Science(), vol 12919. Springer, Cham. https://doi.org/10.1007/978-3-030-88052-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-88052-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88051-4
Online ISBN: 978-3-030-88052-1
eBook Packages: Computer ScienceComputer Science (R0)