Skip to main content
Log in

Ear tracking via Siamese hierarchical refinement network for local active noise control

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Active noise control (ANC) technology has been applied to reduce unwanted sound in the vehicle cabin. In this paper, a real-time ear tracking system assists ANC performance as the driver’s head moves around. For long-term robust ear tracking, an offline-trained ear detector initializes target area. With precise pre-cropped image patches, a Siamese hierarchical refinement network (SHRNet) builds high-fidelity feature map based on Siamese pyramid branch. Hierarchical feature extraction with lateral refinement makes most use of all levels of feature representation. The offline matching network is trained in an augmented dataset from the self-collected in-vehicle ear database and the ear-labeled McGill face video database. Further, Q-learning is capable of learning a decision-making policy for refining tracking strategy to improve efficiency. Extensive experiment results in various scenes based on NVIDIA Jetson TX2 show the tracker performs at a real-time speed while maintaining a robust performance. In particular, the method achieves AUC score of 67.6% with 26 fps on self-collected in-vehicle ear database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. He, A., Luo, C., Tian, X., Zeng, W.: A twofold Siamese network for real-time object tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2018)

  2. Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2016)

  3. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. (2015). https://doi.org/10.1109/TPAMI.2014.2345390

    Article  Google Scholar 

  4. Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)

  5. Danelljan, M., Häger, G., Shahbaz Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. Presented at the (2015)

  6. Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2018)

  7. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2016)

  8. Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of the IEEE International conference on computer vision (2015)

  9. Wang, L., Ouyang, W., Wang, X., Lu, H.: STCT: Sequentially training convolutional networks for visual tracking. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR) (2016)

  10. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. (2016). https://doi.org/10.1109/TPAMI.2015.2439281

    Article  Google Scholar 

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM. (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  12. Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., Lecun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a “Siamese” time delay neural network. Int. J. Pattern Recognit. Artif. Intell. (1993). https://doi.org/10.1142/S0218001493000339

    Article  Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2015). https://doi.org/10.1109/TPAMI.2015.2389824

    Article  Google Scholar 

  14. Moreau, D.J., Ghan, J., Cazzolato, B.S., Zander, A.C.: Active noise control in a pure tone diffuse sound field using virtual sensing. J. Acoust. Soc. Am. (2009). https://doi.org/10.1121/1.3123404

    Article  Google Scholar 

  15. Wang, L., Gan, W.S., Kuo, S.M.: Integration of bass enhancement and active noise control system in automobile cabin. Adv. Acoust. Vib. (2008). https://doi.org/10.1155/2008/869130

    Article  Google Scholar 

  16. Ang, L.Y.L., Koh, Y.K., Lee, H.P.: Acoustic metamaterials: a potential for cabin noise control in automobiles and armored vehicles. Int. J. Appl. Mech. (2016). https://doi.org/10.1142/S1758825116500721

    Article  Google Scholar 

  17. Jung, W., Elliott, S.J., Cheer, J.: Local active control of road noise inside a vehicle. Mech. Syst. Signal Process. (2019). https://doi.org/10.1016/j.ymssp.2018.11.003

    Article  Google Scholar 

  18. Chen, H., Samarasinghe, P., Abhayapala, T.D.: In-car noise field analysis and multi-zone noise cancellation quality estimation. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015 (2016)

  19. Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2010)

  20. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (2017)

  21. Tang, X., Du, D.K., He, Z., Liu, J.: PyramidBox: A context-assisted single shot face detector. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018)

  22. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)

  23. Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision (2017)

  24. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)

  25. Yun, S., Choi, J., Yoo, Y., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (2017)

  26. Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L.: Online multiperson tracking-by-detection from a single, uncalibrated camera. IEEE Trans. Pattern Anal. Mach. Intell. (2011). https://doi.org/10.1109/TPAMI.2010.232

    Article  Google Scholar 

  27. Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan, J.: POI: Multiple object tracking with high performance detection and appearance feature. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)

  28. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2016)

  29. Redmon, J., Farhadi, A.: YOLO v.3. Tech Rep. (2018)

  30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2016)

  31. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature (2015). https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  32. Demirkus, M., Clark, J.J., Arbel, T.: Robust semi-automatic head pose labeling for real-world face video sequences. Multimed. Tools Appl. (2014). https://doi.org/10.1007/s11042-012-1352-1

    Article  Google Scholar 

  33. Demirkus, M., Precup, D., Clark, J.J., Arbel, T.: Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos. Comput. Vis. Image Underst. (2015). https://doi.org/10.1016/j.cviu.2015.03.005

    Article  Google Scholar 

  34. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: q benchmark. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2013)

  35. Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. (2011). https://doi.org/10.1109/TPAMI.2010.226

    Article  Google Scholar 

  36. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: Efficient convolution operators for tracking. In: Proceedings of 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (2017)

  37. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. (2015). https://doi.org/10.1109/TPAMI.2014.2388226

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (No. 51675324); in part by National Natural Science Foundation of China (No. 51805312); and in part by Shanghai Sailing Program (No.18YF1409400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yansong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Zou, Y. & Wang, Y. Ear tracking via Siamese hierarchical refinement network for local active noise control. J Real-Time Image Proc 18, 635–646 (2021). https://doi.org/10.1007/s11554-020-01000-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-020-01000-y

Keywords

Navigation