Why Is the Video Analytics Accuracy Fluctuating, and What Can We Do About It?

Paul, Sibendu; Rao, Kunal; Coviello, Giuseppe; Sankaradas, Murugan; Po, Oliver; Hu, Y. Charlie; Chakradhar, Srimat

doi:10.1007/978-3-031-25056-9_28

Sibendu Paul¹⁰,
Kunal Rao¹¹,
Giuseppe Coviello¹¹,
Murugan Sankaradas¹¹,
Oliver Po¹¹,
Y. Charlie Hu¹⁰ &
…
Srimat Chakradhar¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13801))

Included in the following conference series:

European Conference on Computer Vision

2050 Accesses
1 Citations

Abstract

It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this “leap of faith” that deep learning models that work well on images will also work well on videos is actually flawed. We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually, but are perceived quite differently by the video analytics applications. We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an “unintentional adversary” because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. Our experiments with a number of different cameras, and a variety of different video analytics tasks, show that the inadvertent adversarial effect from the camera can be noticeably offset by quickly re-training the deep learning models using transfer learning. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects ($\sim $40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera’s adversarial effect on deep learning models used for video analytics applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Systematic Review of Video Analytics Using Machine Learning and Deep Learning—A Survey

On Fusion of Learned and Designed Features for Video Data Analytics

Learned scalable video coding for humans and machines

Article Open access 14 November 2024

Notes

1.
We have 1–2 false positive detections for Yolov5 and efficientDet.

References

Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016). https://doi.org/10.1109/ICIP.2016.7533003
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Canel, C., et al.: Scaling video analytics on constrained edge nodes. In: Proceedings of Machine Learning and Systems, vol. 1, pp. 406–417 (2019)
Google Scholar
Chen, T.Y.H., Ravindranath, L., Deng, S., Bahl, P., Balakrishnan, H.: Glimpse: continuous, real-time object recognition on mobile devices. In: Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, pp. 155–168 (2015)
Google Scholar
Cheng, M., Lei, Q., Chen, P.Y., Dhillon, I., Hsieh, C.J.: CAT: customized adversarial training for improved robustness. arXiv preprint arXiv:2002.06789 (2020)
Chiu, Y.C., Tsai, C.Y., Ruan, M.D., Shen, G.Y., Lee, T.T.: Mobilenet-SSDv2: an improved object detection model for embedded systems. In: 2020 International Conference on System Science and Engineering (ICSSE), pp. 1–5. IEEE (2020)
Google Scholar
CNET: How 5G aims to end network latency (2019). CNET_5G_network_latency_time
AXIS Communications: Vapix library. https://www.axis.com/vapix-library/
Connell, J., Fan, Q., Gabbur, P., Haas, N., Pankanti, S., Trinh, H.: Retail video analytics: an overview and survey. In: Video Surveillance and Transportation Imaging Applications, vol. 8663, pp. 260–265 (2013)
Google Scholar
Deng, J., Guo, J., Yuxiang, Z., Yu, J., Kotsia, I., Zafeiriou, S.: RetinaFace: single-stage dense face localisation in the wild. Arxiv (2019)
Google Scholar
Du, K., et al.: Server-driven video streaming for deep learning inference. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 557–570 (2020)
Google Scholar
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4. http://dblp.uni-trier.de/db/journals/ijcv/ijcv88.html#EveringhamGWWZ10
Fang, Y., Zhan, B., Cai, W., Gao, S., Hu, B.: Locality-constrained spatial transformer network for video crowd counting. arXiv preprint arXiv:1907.07911 (2019)
Gaikwad, V., Rake, R.: Video Analytics Market Statistics: 2027 (2021). https://www.alliedmarketresearch.com/video-analytics-market
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330. PMLR (2017)
Google Scholar
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: Proceedings of the International Conference on Learning Representations (2019)
Google Scholar
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15262–15271, June 2021
Google Scholar
Jiang, J., Ananthanarayanan, G., Bodik, P., Sen, S., Stoica, I.: Chameleon: scalable adaptation of video analytics. In: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pp. 253–266 (2018)
Google Scholar
Jin, C., Rinard, M.: Manifold regularization for locally stable deep neural networks. arXiv preprint arXiv:2003.04286 (2020)
Jocher, G., et al.: ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO export and inference (2022). https://doi.org/10.5281/zenodo.6222936
Kang, D., Emmons, J., Abuzaid, F., Bailis, P., Zaharia, M.: NoScope: optimizing neural network queries over video at scale. arXiv preprint arXiv:1703.02529 (2017)
Koh, P.W., et al.: WILDS: a benchmark of in-the-wild distribution shifts. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 5637–5664. PMLR, 18–24 July 2021. https://proceedings.mlr.press/v139/koh21a.html
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Li, Y., Padmanabhan, A., Zhao, P., Wang, Y., Xu, G.H., Netravali, R.: Reducto: on-camera filtering for resource-efficient real-time video analytics. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 359–376 (2020)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Lisota, K.: Understanding video frame rate and shutter speed (2020). https://kevinlisota.photography/2020/04/understanding-video-frame-rate-and-shutter-speed/
Liu, L., Li, H., Gruteser, M.: Edge assisted real-time object detection for mobile augmented reality. In: The 25th Annual International Conference on Mobile Computing and Networking, pp. 1–16 (2019)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rJzIBfZAb
Minderer, M., et al.: Revisiting the calibration of modern neural networks. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Najafi, A., Maeda, S.I., Koyama, M., Miyato, T.: Robustness to adversarial perturbations in learning from incomplete data. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Nyberg, O., Klami, A.: Reliably calibrated isotonic regression. In: Karlapalem, K., et al. (eds.) PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 578–589. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_46
Chapter Google Scholar
Otani, A., Hashiguchi, R., Omi, K., Fukushima, N., Tamaki, T.: On the performance evaluation of action recognition models on transcoded low quality videos. arXiv preprint arXiv:2204.09166 (2022)
Paul, S., et al.: CamTuner: reinforcement-learning based system for camera parameter tuning to enhance analytics (2021). https://doi.org/10.48550/ARXIV.2107.03964. https://arxiv.org/abs/2107.03964
Qualcomm: How 5G low latency improves your mobile experiences (2019). Qualcomm_5G_low-latency_improves_mobile_experience
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Sinha, D., El-Sharkawy, M.: Thin MobileNet: an enhanced mobilenet architecture. In: 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0280–0285. IEEE (2019)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Google Scholar
Viso.ai: Top 16 applications of computer vision in video surveillance and security. https://viso.ai/applications/computer-vision-applications-in-surveillance-and-security/
Wang, L., Sng, D.: Deep learning algorithms with applications to video analytics for a smart city: a survey. arXiv e-prints, arXiv-1512 (2015)
Google Scholar
Wenkel, S., Alhazmi, K., Liiv, T., Alrshoud, S., Simon, M.: Confidence score: the forgotten dimension of object detection performance evaluation. Sensors 21(13), 4350 (2021)
Article Google Scholar
Witte, R., Witte, J.: A T-test for related measures. In: Statistics, pp. 273–285 (2017). ISBN: 9781119254515. www.wiley.com/college/witte
Xie, C., Tan, M., Gong, B., Wang, J., Yuille, A.L., Le, Q.V.: Adversarial examples improve image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Xie, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Zhang, B., Jin, X., Ratnasamy, S., Wawrzynek, J., Lee, E.A.: AWStream: adaptive wide-area streaming analytics. In: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pp. 236–252 (2018)
Google Scholar
Zhang, H., Ananthanarayanan, G., Bodik, P., Philipose, M., Bahl, P., Freedman, M.J.: Live video analytics at scale with approximation and $\{$Delay-Tolerance$\}$. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2017), pp. 377–392 (2017)
Google Scholar
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., Jordan, M.: Theoretically principled trade-off between robustness and accuracy. In: International Conference on Machine Learning, pp. 7472–7482. PMLR (2019)
Google Scholar

Download references

Acknowledgment

This project is supported in part by NEC Labs America and by NSF grant 2211459.

Author information

Authors and Affiliations

Purdue University, West Lafayette, IN, USA
Sibendu Paul & Y. Charlie Hu
NEC Laboratories America, Inc., Princeton, NJ, USA
Kunal Rao, Giuseppe Coviello, Murugan Sankaradas, Oliver Po & Srimat Chakradhar

Authors

Sibendu Paul
View author publications
You can also search for this author in PubMed Google Scholar
Kunal Rao
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Coviello
View author publications
You can also search for this author in PubMed Google Scholar
Murugan Sankaradas
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Po
View author publications
You can also search for this author in PubMed Google Scholar
Y. Charlie Hu
View author publications
You can also search for this author in PubMed Google Scholar
Srimat Chakradhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sibendu Paul .

Editor information

Editors and Affiliations

IBM Research AI and MIT-IBM Watson AI Lab, Haifa, Israel
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paul, S. et al. (2023). Why Is the Video Analytics Accuracy Fluctuating, and What Can We Do About It?. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13801. Springer, Cham. https://doi.org/10.1007/978-3-031-25056-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-25056-9_28
Published: 15 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25055-2
Online ISBN: 978-3-031-25056-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Why Is the Video Analytics Accuracy Fluctuating, and What Can We Do About It?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Systematic Review of Video Analytics Using Machine Learning and Deep Learning—A Survey

On Fusion of Learned and Designed Features for Video Data Analytics

Learned scalable video coding for humans and machines

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Why Is the Video Analytics Accuracy Fluctuating, and What Can We Do About It?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Systematic Review of Video Analytics Using Machine Learning and Deep Learning—A Survey

On Fusion of Learned and Designed Features for Video Data Analytics

Learned scalable video coding for humans and machines

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation