Skip to main content

Vehicles Tracking by Combining Convolutional Neural Network Based Segmentation and Optical Flow Estimation

  • Conference paper
  • First Online:
  • 1432 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12002))

Abstract

Object tracking is an important proxy task towards action recognition. The recent successful CNN models for detection and segmentation, such as Faster R-CNN and Mask R-CNN lead to an effective approach for tracking problem: tracking-by-detection. This very fast type of tracker takes into account only the Intersection-Over-Union (IOU) between bounding boxes to match objects without any other visual information. In contrast, the lack of visual information of IOU tracker combined with the failure detections of CNNs detectors create fragmented trajectories. Inspired by the work of Luc et al. that predicts future segmentations by using Optical flow, we propose an enhanced tracker based on tracking-by-detection and optical flow estimation in vehicle tracking scenario. Our solution generates new detections or segmentations based on translating backward and forward results of CNNs detectors by optical flow vectors. This task can fill in the gaps of trajectories. The qualitative results show that our solution achieved stable performance with different types of flow estimation methods. Then we match generated results with fragmented trajectories by SURF features. DAVIS dataset is used for evaluating the best way to generate new detections. Finally, the entire process is test on DETRAC dataset. The qualitative results show that our methods significantly improve the fragmented trajectories.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, Italy (2017)

    Google Scholar 

  2. Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: International Workshop on Traffic and Street Surveillance for Safety and Security at IEEE AVSS, Italy (2017)

    Google Scholar 

  3. Bay, H., Ess, A., Tuytelaars, T.V., Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008)

    Article  Google Scholar 

  4. Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 500–513 (2011)

    Article  Google Scholar 

  5. Lyu, S., et al.: UA-DETRAC 2017: report of AVSS2017 & IWT4S challenge on advanced traffic monitoring. In: 14th IEEE International Conference on Advanced Video and Signal Based Surveillance AVSS (2017)

    Google Scholar 

  6. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI (2017)

    Google Scholar 

  7. Wang, L., Lu, Y., Wang, H., Zheng, Y., Ye, H., Xue, X.: Evolving boxes for fast vehicle detection. In: IEEE International Conference on Multimedia and Expo ICME (2017)

    Google Scholar 

  8. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  9. Luc, P., Couprie, C., LeCun, Y., Verbeek, J.: Predicting future instance segmentation by forecasting convolutional features. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 593–608. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_36

    Chapter  Google Scholar 

  10. Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross. M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Computer Vision and Pattern Recognition CVPR (2016)

    Google Scholar 

  11. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: The IEEE Conference on Computer Vision and Pattern Recognition CVPR (2017)

    Google Scholar 

  12. Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. IJCV 103, 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  13. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision ICCV (2013)

    Google Scholar 

  14. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems NIPS (2014)

    Google Scholar 

  15. Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2015)

    Google Scholar 

  16. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 39, 640–651 (2017)

    Article  Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2016)

    Google Scholar 

  18. Roth, S.: Discrete-continuous optimization for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR (2012)

    Google Scholar 

  19. Bae, S., Yoon, K.: Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2014)

    Google Scholar 

  20. Dicle, C., Camps, O.I., Sznaier, M.: The way they move: tracking multiple targets with similar appearance. In: IEEE International Conference on Computer Vision ICCV (2013)

    Google Scholar 

  21. Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., Li, S.Z.: Multiple target tracking based on undirected hierarchical relation hypergraph. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2014)

    Google Scholar 

  22. Ilg, E., Mayer, N., Saikia, T., Keuper, K., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2017)

    Google Scholar 

  23. Sun, D., Yang, X., Liu, M.-Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2018)

    Google Scholar 

  24. Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. In: Proceedings of the 34th International Conference on Machine Learning ICML (2017)

    Google Scholar 

  25. Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: forecasting from static images using variational autoencoders. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 835–851. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_51

    Chapter  Google Scholar 

  26. Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: International Conference on Learning Representations ICLR (2016)

    Google Scholar 

  27. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations ICLR (2016)

    Google Scholar 

  28. Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating the future by watching unlabeled video. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2016)

    Google Scholar 

  29. Chen, Q., Koltun, V.: Full flow: optical flow estimation by global optimization over regular grids. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tuan-Hung Vu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vu, TH., Boonaert, J., Ambellouis, S., Ahmed, A.T. (2020). Vehicles Tracking by Combining Convolutional Neural Network Based Segmentation and Optical Flow Estimation. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2020. Lecture Notes in Computer Science(), vol 12002. Springer, Cham. https://doi.org/10.1007/978-3-030-40605-9_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-40605-9_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-40604-2

  • Online ISBN: 978-3-030-40605-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics