skip to main content
10.1145/3319921.3319950acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaiConference Proceedingsconference-collections
research-article

Inference Adaptive Thresholding based Non-Maximum Suppression for Object Detection in Video Image Sequence

Authors Info & Claims
Published:15 March 2019Publication History

ABSTRACT

This study proposes a novel inference adaptive thresholding based non-maximum suppression (NMS) (IAT-NMS) algorithm for deriving temporal cues between video sequences. The inference of temporal connectivity is first derived according to an overlapping measure of the bounding boxes between adjacent frames. Frames with high-confidence detection object are taken as key frames to leverage the scores of neighbor detections and preserve potential detections of blurred objects with low scores. Then, bounding boxes within each frame are ranked via their confidence scores and the overlapping ratio between the bounding box with the highest score against the remaining surrounding boxes is computed. This measure of overlapping is brought into a Gaussian function to estimate weights for adaptive suppression and to softly suppress the detection scores of possible severely overlapped objects. The proposed method is compared with state-of-the-art video object detection techniques. With the application of IAT-NMS, overlapping objects originally undistinguishable in the compared methods become detectable. Experimental results demonstrate that this simple and unsupervised method outperforms state-of-the-art NMS algorithms, with an increase of 6% in mean average precision (mAP) on the ImageNet VID dataset. Our study on performance limitations and sensitivity to parametric variations also finds that IAT-NMS demonstrates better detection capability than does the three compared algorithms, which fail to detect all targets or distinguish in the presence of multiple overlapping targets.

References

  1. Ren S, He K, Girshick R, Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence. 39(6):1137--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ren S, He K, Girshick R, Zhang X, Sun J. 2017. Object detection networks on convolutional feature maps. IEEE transactions on pattern analysis and machine intelligence. 39(7):1476--81.Google ScholarGoogle Scholar
  3. Redmon J, Farhadi A. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Honolulu, HI, USA, July 21-26, 2017). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  4. Girshick R, Donahue J, Darrell T, Malik J, editors. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Columbus, OH, USA, June 23-28, 2014). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chen C, Seff A, Kornhauser A, Xiao J, editors. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision (Santiago, Chile, December 7-13, 2015). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen X, Ma H, Wan J, Li B, Xia T, editors. 2017. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Honolulu, HI, USA, July 21-26, 2017). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  7. Venetianer PL, Lipton AJ, Chosak AJ, Frazier MF, Haering N, Myers GW, et al. 2018. Video surveillance system employing video primitives, US8711217.Google ScholarGoogle Scholar
  8. Lande, R., & Mulajkar, R. M. 2018. Moving Object Detection using Foreground Detection for Video Surveillance System. International Research Journal of Engineering and Technology, 5(6): 517--519, e-ISSN: 2395--0056.Google ScholarGoogle Scholar
  9. Zhu X, Xiong Y, Dai J, Yuan L, Wei Y, editors. 2017. Deep feature flow for video recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Honolulu, HI, USA, July 21-26, 2017). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  10. Kang K, Li H, Xiao T, Ouyang W, Yan J, Liu X, et al., editors. 2017. Object detection in videos with tubelet proposal networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Honolulu, HI, USA, July 21-26, 2017). IEEE.Google ScholarGoogle Scholar
  11. Dalal N, Triggs B, editors. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D. 2010. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence. 32(9):1627--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Han W, Khorrami P, Paine TL, Ramachandran P, Babaeizadeh M, Shi H, et al. 2016. Seq-nms for video object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. arXiv:1602.08465.Google ScholarGoogle Scholar
  14. Hosang J, Benenson R, Schiele B, editors. 2016. A convnet for non-maximum suppression. In Proceedings of the German Conference on Pattern Recognition. arXiv:1511.06437.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ma L, Kan X, Xiao Q, Liu W, Sun P. 2017. Yes-Net: An effective Detector Based on Global Information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. arXiv:170609180.Google ScholarGoogle Scholar
  16. Hosang J, Benenson R, Schiele B. 2017. Learning non-maximum suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Honolulu, HI, USA, July 21-26, 2017). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  17. Bodla N, Singh B, Chellappa R, Davis LS, editors. 2017. Soft-NMS---Improving Object Detection with One Line of Code. In Proceedings of the IEEE International Conference on Computer Vision (Venice, Italy October 22-29 2017). IEEE.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Inference Adaptive Thresholding based Non-Maximum Suppression for Object Detection in Video Image Sequence

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICIAI '19: Proceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence
      March 2019
      279 pages
      ISBN:9781450361286
      DOI:10.1145/3319921

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 March 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader