On Fusion of Learned and Designed Features for Video Data Analytics

Dobranský, Marek; Skopal, Tomáš

doi:10.1007/978-3-030-67835-7_23

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Included in the following conference series:

International Conference on Multimedia Modeling

1788 Accesses

Abstract

Video cameras have become widely used for indoor and outdoor surveillance. Covering more and more public space in cities, the cameras serve various purposes ranging from security to traffic monitoring, urban life, and marketing. However, with the increasing quantity of utilized cameras and recorded streams, manual video monitoring and analysis becomes too laborious. The goal is to obtain effective and efficient artificial intelligence models to process the video data automatically and produce the desired features for data analytics. To this end, we propose a framework for real-time video feature extraction that fuses both learned and hand-designed analytical models and is applicable in real-life situations. Nowadays, state-of-the-art models for various computer vision tasks are implemented by deep learning. However, the exhaustive gathering of labeled training data and the computational complexity of resulting models can often render them impractical. We need to consider the benefits and limitations of each technique and find the synergy between both deep learning and analytical models. Deep learning methods are more suited for simpler tasks on large volumes of dense data while analytical modeling can be sufficient for processing of sparse data with complex structures. Our framework follows those principles by taking advantage of multiple levels of abstraction. In a use case, we show how the framework can be set for an advanced video analysis of urban life.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A re-identified object is a previously recognized object that is identified again in different conditions (different scene/camera, lighting, color balance, image resolution, object pose, etc.).

References

Awad, G., et al.: TRECVID 2017: evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In: Proceedings of TRECVID 2017. NIST, USA (2017)
Google Scholar
Bissmark, J., Wärnling, O.: The sparse data problem within classification algorithms: the effect of sparse data on the Naïve Bayes algorithm (2017)
Google Scholar
Budikova, P., Batko, M., Zezula, P.: Fusion strategies for large-scale multi-modal image retrieval. In: Hameurlain, A., Küng, J., Wagner, R., Akbarinia, R., Pacitti, E. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIII. LNCS, vol. 10430, pp. 146–184. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-55696-2_5
Chapter Google Scholar
Bustos, B., Kreft, S., Skopal, T.: Adapting metric indexes for searching in multi-metric spaces. Multimed. Tools Appl. 58(3), 467–496 (2012)
Article Google Scholar
Čech, P., Maroušek, J., Lokoč, J., Silva, Y.N., Starks, J.: Comparing MapReduce-based k-NN similarity joins on Hadoop for high-dimensional data. In: Cong, G., Peng, W.-C., Zhang, W.E., Li, C., Sun, A. (eds.) ADMA 2017. LNCS (LNAI), vol. 10604, pp. 63–75. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69179-4_5
Chapter Google Scholar
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Article Google Scholar
Deng, K., Xie, K., Zheng, K., Zhou, X.: Trajectory indexing and retrieval. In: Zheng, Y., Zhou, X. (eds.) Computing with Spatial Trajectories. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1629-6_2
Dobranský, M.: Object detection for video surveillance using SSD approach (2019). http://hdl.handle.net/20.500.11956/107024
Dohnal, V., Gennaro, C., Zezula, P.: Similarity join in metric spaces using eD-index. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45227-0_48
Chapter Google Scholar
Hou, R., Chen, C., Shah, M.: Tube convolutional neural network (T-CNN) for action detection in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5822–5831 (2017)
Google Scholar
Hsieh, K., et al.: Focus: querying large video datasets with low latency and low cost. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 269–286. USENIX Association (October 2018)
Google Scholar
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)
Article MathSciNet Google Scholar
Kang, D., Bailis, P., Zaharia, M.: BlazeIt: optimizing declarative aggregation and limit queries for neural network-based video analytics. Proc. VLDB Endow. 13(4), 533–546 (2019)
Article Google Scholar
Kang, D., Emmons, J., Abuzaid, F., Bailis, P., Zaharia, M.: NoScope: optimizing neural network queries over video at scale. Proc. VLDB Endow. 10(11), 1586–1597 (2017)
Article Google Scholar
Li, W., Zhao, R., Xiao, T., Wang, X.: DeepReID: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159 (2014)
Google Scholar
Li, X., Ling, C.X., Wang, H.: The convergence behavior of Naive Bayes on large sparse datasets. In: 2015 IEEE International Conference on Data Mining, pp. 853–858 (November 2015). https://doi.org/10.1109/ICDM.2015.53
Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21–25, 2019, pp. 1786–1794 (2019). https://doi.org/10.1145/3343031.3350906
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lokoč, J., Bailer, W., Schoeffmann, K., Münzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimed. 20(12), 3361–3376 (2018). https://doi.org/10.1109/TMM.2018.2830110
Article Google Scholar
Manolopoulos, Y.: Spatial Databases: Technologies, Techniques and Trends. IGI Global, Hershey (2005)
Book Google Scholar
Qi, Y., et al.: Hedged deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Shi, Z., Hospedales, T.M., Xiang, T.: Transferring a semantic representation for person re-identification and search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4184–4193 (2015)
Google Scholar
Sun, S., Akhtar, N., Song, H., Mian, A.S., Shah, M.: Deep affinity network for multiple object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 43, 104–119 (2019)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. arXiv preprint arXiv:1911.09070 (2019)
Wang, H., Belhassena, A.: Parallel trajectory search based on distributed index. Inf. Sci. 388–389, 62–83 (2017)
Article Google Scholar
Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W.: Attention-aware compositional network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2119–2128 (2018)
Google Scholar
Yang, J., Huang, X.: A hybrid spatial index for massive point cloud data management and visualization. Trans. GIS 18, 97–108 (2014)
Article Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, Heidelberg (2005). https://doi.org/10.1007/0-387-29151-2
Book MATH Google Scholar
Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3586–3593 (2013)
Google Scholar
Zhu, Y., Zheng, V.W., Yang, Q.: Activity recognition from trajectory data. In: Zheng, Y., Zhou, X. (eds.) Computing with Spatial Trajectories. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1629-6_6

Download references

Author information

Authors and Affiliations

SIRET Research Group, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
Marek Dobranský & Tomáš Skopal

Authors

Marek Dobranský
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Skopal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomáš Skopal .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dobranský, M., Skopal, T. (2021). On Fusion of Learned and Designed Features for Video Data Analytics. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-67835-7_23
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics