Skip to main content

On Fusion of Learned and Designed Features for Video Data Analytics

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Included in the following conference series:

  • 1788 Accesses

Abstract

Video cameras have become widely used for indoor and outdoor surveillance. Covering more and more public space in cities, the cameras serve various purposes ranging from security to traffic monitoring, urban life, and marketing. However, with the increasing quantity of utilized cameras and recorded streams, manual video monitoring and analysis becomes too laborious. The goal is to obtain effective and efficient artificial intelligence models to process the video data automatically and produce the desired features for data analytics. To this end, we propose a framework for real-time video feature extraction that fuses both learned and hand-designed analytical models and is applicable in real-life situations. Nowadays, state-of-the-art models for various computer vision tasks are implemented by deep learning. However, the exhaustive gathering of labeled training data and the computational complexity of resulting models can often render them impractical. We need to consider the benefits and limitations of each technique and find the synergy between both deep learning and analytical models. Deep learning methods are more suited for simpler tasks on large volumes of dense data while analytical modeling can be sufficient for processing of sparse data with complex structures. Our framework follows those principles by taking advantage of multiple levels of abstraction. In a use case, we show how the framework can be set for an advanced video analysis of urban life.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A re-identified object is a previously recognized object that is identified again in different conditions (different scene/camera, lighting, color balance, image resolution, object pose, etc.).

References

  1. Awad, G., et al.: TRECVID 2017: evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In: Proceedings of TRECVID 2017. NIST, USA (2017)

    Google Scholar 

  2. Bissmark, J., Wärnling, O.: The sparse data problem within classification algorithms: the effect of sparse data on the Naïve Bayes algorithm (2017)

    Google Scholar 

  3. Budikova, P., Batko, M., Zezula, P.: Fusion strategies for large-scale multi-modal image retrieval. In: Hameurlain, A., Küng, J., Wagner, R., Akbarinia, R., Pacitti, E. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIII. LNCS, vol. 10430, pp. 146–184. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-55696-2_5

    Chapter  Google Scholar 

  4. Bustos, B., Kreft, S., Skopal, T.: Adapting metric indexes for searching in multi-metric spaces. Multimed. Tools Appl. 58(3), 467–496 (2012)

    Article  Google Scholar 

  5. Čech, P., Maroušek, J., Lokoč, J., Silva, Y.N., Starks, J.: Comparing MapReduce-based k-NN similarity joins on Hadoop for high-dimensional data. In: Cong, G., Peng, W.-C., Zhang, W.E., Li, C., Sun, A. (eds.) ADMA 2017. LNCS (LNAI), vol. 10604, pp. 63–75. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69179-4_5

    Chapter  Google Scholar 

  6. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  7. Deng, K., Xie, K., Zheng, K., Zhou, X.: Trajectory indexing and retrieval. In: Zheng, Y., Zhou, X. (eds.) Computing with Spatial Trajectories. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1629-6_2

  8. Dobranský, M.: Object detection for video surveillance using SSD approach (2019). http://hdl.handle.net/20.500.11956/107024

  9. Dohnal, V., Gennaro, C., Zezula, P.: Similarity join in metric spaces using eD-index. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45227-0_48

    Chapter  Google Scholar 

  10. Hou, R., Chen, C., Shah, M.: Tube convolutional neural network (T-CNN) for action detection in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5822–5831 (2017)

    Google Scholar 

  11. Hsieh, K., et al.: Focus: querying large video datasets with low latency and low cost. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 269–286. USENIX Association (October 2018)

    Google Scholar 

  12. Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)

    Article  MathSciNet  Google Scholar 

  13. Kang, D., Bailis, P., Zaharia, M.: BlazeIt: optimizing declarative aggregation and limit queries for neural network-based video analytics. Proc. VLDB Endow. 13(4), 533–546 (2019)

    Article  Google Scholar 

  14. Kang, D., Emmons, J., Abuzaid, F., Bailis, P., Zaharia, M.: NoScope: optimizing neural network queries over video at scale. Proc. VLDB Endow. 10(11), 1586–1597 (2017)

    Article  Google Scholar 

  15. Li, W., Zhao, R., Xiao, T., Wang, X.: DeepReID: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159 (2014)

    Google Scholar 

  16. Li, X., Ling, C.X., Wang, H.: The convergence behavior of Naive Bayes on large sparse datasets. In: 2015 IEEE International Conference on Data Mining, pp. 853–858 (November 2015). https://doi.org/10.1109/ICDM.2015.53

  17. Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21–25, 2019, pp. 1786–1794 (2019). https://doi.org/10.1145/3343031.3350906

  18. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  19. Lokoč, J., Bailer, W., Schoeffmann, K., Münzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimed. 20(12), 3361–3376 (2018). https://doi.org/10.1109/TMM.2018.2830110

    Article  Google Scholar 

  20. Manolopoulos, Y.: Spatial Databases: Technologies, Techniques and Trends. IGI Global, Hershey (2005)

    Book  Google Scholar 

  21. Qi, Y., et al.: Hedged deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)

    Google Scholar 

  22. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  23. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

    Google Scholar 

  24. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  25. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  26. Shi, Z., Hospedales, T.M., Xiang, T.: Transferring a semantic representation for person re-identification and search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4184–4193 (2015)

    Google Scholar 

  27. Sun, S., Akhtar, N., Song, H., Mian, A.S., Shah, M.: Deep affinity network for multiple object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 43, 104–119 (2019)

    Google Scholar 

  28. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. arXiv preprint arXiv:1911.09070 (2019)

  29. Wang, H., Belhassena, A.: Parallel trajectory search based on distributed index. Inf. Sci. 388–389, 62–83 (2017)

    Article  Google Scholar 

  30. Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W.: Attention-aware compositional network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2119–2128 (2018)

    Google Scholar 

  31. Yang, J., Huang, X.: A hybrid spatial index for massive point cloud data management and visualization. Trans. GIS 18, 97–108 (2014)

    Article  Google Scholar 

  32. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, Heidelberg (2005). https://doi.org/10.1007/0-387-29151-2

    Book  MATH  Google Scholar 

  33. Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3586–3593 (2013)

    Google Scholar 

  34. Zhu, Y., Zheng, V.W., Yang, Q.: Activity recognition from trajectory data. In: Zheng, Y., Zhou, X. (eds.) Computing with Spatial Trajectories. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1629-6_6

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomáš Skopal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dobranský, M., Skopal, T. (2021). On Fusion of Learned and Designed Features for Video Data Analytics. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67835-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67834-0

  • Online ISBN: 978-3-030-67835-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics