Abstract
Active noise control (ANC) technology has been applied to reduce unwanted sound in the vehicle cabin. In this paper, a real-time ear tracking system assists ANC performance as the driver’s head moves around. For long-term robust ear tracking, an offline-trained ear detector initializes target area. With precise pre-cropped image patches, a Siamese hierarchical refinement network (SHRNet) builds high-fidelity feature map based on Siamese pyramid branch. Hierarchical feature extraction with lateral refinement makes most use of all levels of feature representation. The offline matching network is trained in an augmented dataset from the self-collected in-vehicle ear database and the ear-labeled McGill face video database. Further, Q-learning is capable of learning a decision-making policy for refining tracking strategy to improve efficiency. Extensive experiment results in various scenes based on NVIDIA Jetson TX2 show the tracker performs at a real-time speed while maintaining a robust performance. In particular, the method achieves AUC score of 67.6% with 26 fps on self-collected in-vehicle ear database.
Similar content being viewed by others
References
He, A., Luo, C., Tian, X., Zeng, W.: A twofold Siamese network for real-time object tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2018)
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2016)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. (2015). https://doi.org/10.1109/TPAMI.2014.2345390
Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)
Danelljan, M., Häger, G., Shahbaz Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. Presented at the (2015)
Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2018)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2016)
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of the IEEE International conference on computer vision (2015)
Wang, L., Ouyang, W., Wang, X., Lu, H.: STCT: Sequentially training convolutional networks for visual tracking. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR) (2016)
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. (2016). https://doi.org/10.1109/TPAMI.2015.2439281
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM. (2017). https://doi.org/10.1145/3065386
Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., Lecun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a “Siamese” time delay neural network. Int. J. Pattern Recognit. Artif. Intell. (1993). https://doi.org/10.1142/S0218001493000339
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2015). https://doi.org/10.1109/TPAMI.2015.2389824
Moreau, D.J., Ghan, J., Cazzolato, B.S., Zander, A.C.: Active noise control in a pure tone diffuse sound field using virtual sensing. J. Acoust. Soc. Am. (2009). https://doi.org/10.1121/1.3123404
Wang, L., Gan, W.S., Kuo, S.M.: Integration of bass enhancement and active noise control system in automobile cabin. Adv. Acoust. Vib. (2008). https://doi.org/10.1155/2008/869130
Ang, L.Y.L., Koh, Y.K., Lee, H.P.: Acoustic metamaterials: a potential for cabin noise control in automobiles and armored vehicles. Int. J. Appl. Mech. (2016). https://doi.org/10.1142/S1758825116500721
Jung, W., Elliott, S.J., Cheer, J.: Local active control of road noise inside a vehicle. Mech. Syst. Signal Process. (2019). https://doi.org/10.1016/j.ymssp.2018.11.003
Chen, H., Samarasinghe, P., Abhayapala, T.D.: In-car noise field analysis and multi-zone noise cancellation quality estimation. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015 (2016)
Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2010)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (2017)
Tang, X., Du, D.K., He, Z., Liu, J.: PyramidBox: A context-assisted single shot face detector. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)
Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision (2017)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)
Yun, S., Choi, J., Yoo, Y., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (2017)
Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L.: Online multiperson tracking-by-detection from a single, uncalibrated camera. IEEE Trans. Pattern Anal. Mach. Intell. (2011). https://doi.org/10.1109/TPAMI.2010.232
Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan, J.: POI: Multiple object tracking with high performance detection and appearance feature. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2016)
Redmon, J., Farhadi, A.: YOLO v.3. Tech Rep. (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature (2015). https://doi.org/10.1038/nature14236
Demirkus, M., Clark, J.J., Arbel, T.: Robust semi-automatic head pose labeling for real-world face video sequences. Multimed. Tools Appl. (2014). https://doi.org/10.1007/s11042-012-1352-1
Demirkus, M., Precup, D., Clark, J.J., Arbel, T.: Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos. Comput. Vis. Image Underst. (2015). https://doi.org/10.1016/j.cviu.2015.03.005
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: q benchmark. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (2013)
Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. (2011). https://doi.org/10.1109/TPAMI.2010.226
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: Efficient convolution operators for tracking. In: Proceedings of 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (2017)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. (2015). https://doi.org/10.1109/TPAMI.2014.2388226
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (No. 51675324); in part by National Natural Science Foundation of China (No. 51805312); and in part by Shanghai Sailing Program (No.18YF1409400).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, W., Zou, Y. & Wang, Y. Ear tracking via Siamese hierarchical refinement network for local active noise control. J Real-Time Image Proc 18, 635–646 (2021). https://doi.org/10.1007/s11554-020-01000-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-01000-y