Vehicles Tracking by Combining Convolutional Neural Network Based Segmentation and Optical Flow Estimation

Vu, Tuan-Hung; Boonaert, Jacques; Ambellouis, Sebastien; Ahmed, Abdelmalik Taleb

doi:10.1007/978-3-030-40605-9_45

Vehicles Tracking by Combining Convolutional Neural Network Based Segmentation and Optical Flow Estimation

Tuan-Hung Vu¹³,
Jacques Boonaert¹³,
Sebastien Ambellouis¹⁴ &
…
Abdelmalik Taleb Ahmed¹⁵

Conference paper
First Online: 06 February 2020

1432 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12002))

Abstract

Object tracking is an important proxy task towards action recognition. The recent successful CNN models for detection and segmentation, such as Faster R-CNN and Mask R-CNN lead to an effective approach for tracking problem: tracking-by-detection. This very fast type of tracker takes into account only the Intersection-Over-Union (IOU) between bounding boxes to match objects without any other visual information. In contrast, the lack of visual information of IOU tracker combined with the failure detections of CNNs detectors create fragmented trajectories. Inspired by the work of Luc et al. that predicts future segmentations by using Optical flow, we propose an enhanced tracker based on tracking-by-detection and optical flow estimation in vehicle tracking scenario. Our solution generates new detections or segmentations based on translating backward and forward results of CNNs detectors by optical flow vectors. This task can fill in the gaps of trajectories. The qualitative results show that our solution achieved stable performance with different types of flow estimation methods. Then we match generated results with fragmented trajectories by SURF features. DAVIS dataset is used for evaluating the best way to generate new detections. Finally, the entire process is test on DETRAC dataset. The qualitative results show that our methods significantly improve the fragmented trajectories.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, Italy (2017)
Google Scholar
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: International Workshop on Traffic and Street Surveillance for Safety and Security at IEEE AVSS, Italy (2017)
Google Scholar
Bay, H., Ess, A., Tuytelaars, T.V., Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008)
Article Google Scholar
Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 500–513 (2011)
Article Google Scholar
Lyu, S., et al.: UA-DETRAC 2017: report of AVSS2017 & IWT4S challenge on advanced traffic monitoring. In: 14th IEEE International Conference on Advanced Video and Signal Based Surveillance AVSS (2017)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI (2017)
Google Scholar
Wang, L., Lu, Y., Wang, H., Zheng, Y., Ye, H., Xue, X.: Evolving boxes for fast vehicle detection. In: IEEE International Conference on Multimedia and Expo ICME (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Luc, P., Couprie, C., LeCun, Y., Verbeek, J.: Predicting future instance segmentation by forecasting convolutional features. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 593–608. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_36
Chapter Google Scholar
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross. M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Computer Vision and Pattern Recognition CVPR (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: The IEEE Conference on Computer Vision and Pattern Recognition CVPR (2017)
Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. IJCV 103, 60–79 (2013)
Article MathSciNet Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision ICCV (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems NIPS (2014)
Google Scholar
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2015)
Google Scholar
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 39, 640–651 (2017)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2016)
Google Scholar
Roth, S.: Discrete-continuous optimization for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR (2012)
Google Scholar
Bae, S., Yoon, K.: Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2014)
Google Scholar
Dicle, C., Camps, O.I., Sznaier, M.: The way they move: tracking multiple targets with similar appearance. In: IEEE International Conference on Computer Vision ICCV (2013)
Google Scholar
Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., Li, S.Z.: Multiple target tracking based on undirected hierarchical relation hypergraph. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2014)
Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, K., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2017)
Google Scholar
Sun, D., Yang, X., Liu, M.-Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2018)
Google Scholar
Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. In: Proceedings of the 34th International Conference on Machine Learning ICML (2017)
Google Scholar
Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: forecasting from static images using variational autoencoders. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 835–851. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_51
Chapter Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: International Conference on Learning Representations ICLR (2016)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations ICLR (2016)
Google Scholar
Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating the future by watching unlabeled video. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2016)
Google Scholar
Chen, Q., Koltun, V.: Full flow: optical flow estimation by global optimization over regular grids. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

IMT Lille Douai, Douai, France
Tuan-Hung Vu & Jacques Boonaert
IFSTTAR, Villeneuve d’Ascq, France
Sebastien Ambellouis
Université Politechnique Hauts-de-France, Valenciennes, France
Abdelmalik Taleb Ahmed

Authors

Tuan-Hung Vu
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Boonaert
View author publications
You can also search for this author in PubMed Google Scholar
Sebastien Ambellouis
View author publications
You can also search for this author in PubMed Google Scholar
Abdelmalik Taleb Ahmed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tuan-Hung Vu .

Editor information

Editors and Affiliations

DGA, Paris, France
Jacques Blanc-Talon
University of Auckland, Auckland, New Zealand
Patrice Delmas
Ghent University, Ghent, Belgium
Wilfried Philips
CSIRO, Canberra, Australia
Dan Popescu
University of Antwerp, Wilrijk, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vu, TH., Boonaert, J., Ambellouis, S., Ahmed, A.T. (2020). Vehicles Tracking by Combining Convolutional Neural Network Based Segmentation and Optical Flow Estimation. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2020. Lecture Notes in Computer Science(), vol 12002. Springer, Cham. https://doi.org/10.1007/978-3-030-40605-9_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-40605-9_45
Published: 06 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40604-2
Online ISBN: 978-3-030-40605-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics