Skip to main content

Remote Recovery of Sound from Speckle Pattern Video Based on Convolutional LSTM

  • Conference paper
  • First Online:
Information and Communications Security (ICICS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12919))

Included in the following conference series:

Abstract

In the field of security surveillance, remotely acquire the sound signal of the target is an attractive research topic. The research has broad application prospects, such as counter-terrorism, rescue, medical monitoring, and so on. To obtain clear and accurate sound signal of the target, we propose a method based on convolutional LSTM network to recover the sound. The principle of our method consists of two steps. First, we record the speckle images of target remotely. Then we utilize the convolutional LSTM network to extract the subtle movement from speckle images. The results demonstrate that our network is superior to convolutional neural network in the accuracy and efficiency of processing temporal-spatial speckle image data. The influence of different sampling rates on sound extraction is revealed through appropriate experimental settings. In addition, we also reveal the principle that our network has stronger generalization ability than convolutional neural network. Benefit from the powerful generalization ability of the network, our method could perform accurate and robust sound extraction to unseen objects. The excellent performance of our method proves that it is a significant development in the field of remote sound acquisition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016). http://arxiv.org/org/abs/1607.06450v1

  2. Barcellona, C., et al.: Remote recovery of audio signals from videos of optical speckle patterns: a comparative study of signal recovery algorithms. Opt. Express 28(6), 8716–8723 (2020). https://doi.org/10.1364/OE.386406

    Article  Google Scholar 

  3. Billa, J.: Dropout approaches for LSTM based speech recognition systems. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5879–5883 (2018). https://doi.org/10.1109/ICASSP.2018.8462544

  4. Blaber, J., Adair, B., Antoniou, A.: Ncorr: open-source 2D digital image correlation matlab software. Experiment. Mech. 55, 1105–1122 (2015)

    Article  Google Scholar 

  5. Castellini, P., Martarelli, M., Tomasini, E.: Laser doppler vibrometry: development of advanced solutions answering to technology’s needs. Mech. Syst. Signal Process. 20(6), 1265–1285 (2006). https://doi.org/10.1016/j.ymssp.2005.11.015

    Article  Google Scholar 

  6. Davis, A., Rubinstein, M., Wadhwa, N., Mysore, G.J., Durand, F., Freeman, W.T.: The visual microphone: passive recovery of sound from video. ACM Trans. Graph. 33(4) (2014)

    Google Scholar 

  7. Diamond, D.H., Heyns, P.S., Oberholster, A.J.: Accuracy evaluation of sub-pixel structural vibration measurements through optical flow analysis of a video sequence. Measurement 95, 166–172 (2017)

    Article  Google Scholar 

  8. Garg, P., et al.: Measuring transverse displacements using unmanned aerial systems laser doppler vibrometer (UAS-LDV): development and field validation. Sensors 20(21) (2020). https://doi.org/10.3390/s20216051

  9. Graves, A.: Generating sequences with recurrent neural networks. ArXiv abs/1308.0850 (2013)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

    Google Scholar 

  11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  12. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)

    Google Scholar 

  13. Kritsis, K., Kaliakatsos-Papakostas, M., Katsouros, V., Pikrakis, A.: Deep convolutional and lstm neural network architectures on leap motion hand tracking data sequences. In: 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5 (2019). 10.23919/EUSIPCO.2019.8902973

    Google Scholar 

  14. Mutegeki, R., Han, D.S.: A cnn-lstm approach to human activity recognition. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 362–366 (2020). https://doi.org/10.1109/ICAIIC48513.2020.9065078

  15. Ozana, N., et al.: Demonstration of a remote optical measurement configuration that correlates with breathing, heart rate, pulse pressure, blood coagulation, and blood oxygenation. Proc. IEEE 103(2), 248–262 (2015). https://doi.org/10.1109/JPROC.2014.2385793

    Article  Google Scholar 

  16. Pasunuru, R., Bansal, M.: Multi-task video captioning with video and entailment generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1273–1283 (Jul 2017)

    Google Scholar 

  17. Peters, W.H., Ranson, W.F.: Digital imaging techniques in experimental stress analysis. Optical Eng. 21(3), 427–431 (1982). https://doi.org/10.1117/12.7972925

    Article  Google Scholar 

  18. Rothberg, S., et al.: An international review of laser doppler vibrometry: making light work of vibration measurement. Optics Lasers Eng. 99, 11–22 (2017). https://doi.org/10.1016/j.optlaseng.2016.10.023

    Article  Google Scholar 

  19. Shao, X., Zhong, F., Huang, W., Dai, X., Chen, Z., He, X.: Digital image correlation with improved efficiency by pixel selection. Appl. Opt. 59(11), 3389–3398 (2020). https://doi.org/10.1364/AO.387678

    Article  Google Scholar 

  20. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.k., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, p. 802–810. NIPS 2015 (2015)

    Google Scholar 

  21. Smith, B.M., O’Toole, M., Gupta, M.: Tracking multiple objects outside the line of sight using speckle imaging. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6258–6266 (2018). https://doi.org/10.1109/CVPR.2018.00655

  22. Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - vol. 37, pp. 843–852. ICML 2015 (2015)

    Google Scholar 

  23. Xu, Z., Li, S., Deng, W.: Learning temporal features using lstm-cnn architecture for face anti-spoofing. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). pp. 141–145 (2015). https://doi.org/10.1109/ACPR.2015.7486482

  24. Yang, D., Su, Z., Zhang, S., Zhang, D.: Real-time matching strategy for rotary objects using digital image correlation. Appl. Opt. 59(22), 6648–6657 (2020). https://doi.org/10.1364/AO.397655

    Article  Google Scholar 

  25. Zalevsky, Z., et al.: Simultaneous remote extraction of multiple speech sources and heart beats from secondary speckles pattern. Opt. Express 17(24), 21566–21580 (2009). https://doi.org/10.1364/OE.17.021566

    Article  Google Scholar 

  26. Zhu, D., Yang, L., Li, Z., Zeng, H.: Remote speech extraction from speckle image by convolutional neural network. In: 2020 IEEE Symposium on Computers and Communications (ISCC), pp. 1–6 (2020). https://doi.org/10.1109/ISCC50000.2020.9219652

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hualin Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, D., Yang, L., Zeng, H. (2021). Remote Recovery of Sound from Speckle Pattern Video Based on Convolutional LSTM. In: Gao, D., Li, Q., Guan, X., Liao, X. (eds) Information and Communications Security. ICICS 2021. Lecture Notes in Computer Science(), vol 12919. Springer, Cham. https://doi.org/10.1007/978-3-030-88052-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88052-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88051-4

  • Online ISBN: 978-3-030-88052-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics