Abstract
Concept-free search, which embeds text and video signals in a joint space for retrieval, appears to be a new state-of-the-art. However, this new search paradigm suffers from two limitations. First, the search result is unpredictable and not interpretable. Second, the embedded features are in high-dimensional space hindering real-time indexing and search. In this paper, we present a new implementation of the Vireo video search system (Vireo-VSS), which employs a dual-task model to index each video segment with an embedding feature in a low dimension and a concept list for retrieval. The concept list serves as a reference to interpret its associated embedded feature. With these changes, a SQL-like querying interface is designed such that a user can specify the search content (subject, predicate, object) and constraint (logical condition) in a semi-structured way. The system will decompose the SQL-like query into multiple sub-queries depending on the constraint being specified. Each sub-query is translated into an embedding feature and a concept list for video retrieval. The search result is compiled by union or pruning of the search lists from multiple sub-queries. The SQL-like interface is also extended for temporal querying, by providing multiple SQL templates for users to specify the temporal evolution of a query.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
21 January 2021
The original version of the book was inadvertently published with an incorrect acknowledgement in chapter 34. The acknowledgement has been corrected and reads as follows:
Acknowledgement: The research was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the National Natural Science Foundation of China (No. 61872256).
The affiliation of the third author, Zhixin Ma, was incorrect. In the contribution it read “School of Information System,” but correctly it should be “School of Computing and Information Systems”.
The affiliation of the last author, Chong-Wah Ngo, was not correct. In the book it read “Department of Computer Science, City University of Hong Kong, Hong Kong, China”. Instead, the correct affiliation is: “School of Computing and Information Systems, Singapore Management University, Singapore, Singapore”.
Additionally, his e-mail address “cscwngo@cityu.edu.hk” was also incorrect. The correct e-mail address is: “cwngo@smu.edu.sg”.
The chapter and the book have been updated with the changes.
Notes
References
Awad, G., et al.: Trecvid 2016: evaluating video search, video event detection, localization, and hyperlinking. In: TRECVID 2016 Workshop (2016)
Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., Awad, G.: V3C1 dataset: an evaluation of content characteristics. In: ICMR, pp. 334–338 (2019)
Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2vv++: fully deep learning for ad-hoc video search. In: ACM MM (2019)
Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM TOMM 15(1), 29:1–29:18 (2019)
Lokoč, J., Bailer, W., Schoeffmann, K., Muenzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE TMM 20(12), 3361–3376 (2018)
Nguyen, P.A., Lu, Y.-J., Zhang, H., Ngo, C.-W.: Enhanced VIREO KIS at VBS 2018. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 407–412. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_42
Nguyen, P.A., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2019. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 609–615. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05716-9_54
Nguyen, P.A., Wu, J., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 772–777. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_68
Rossetto, L., et al.: Interactive video retrieval in the age of deep learning - detailed evaluation of VBS 2019. IEEE TMM 1 (2020)
Wu, J., Ngo, C.W.: Interpretable embedding for ad-hoc video search. In: ACM MM (2020)
Acknowledgement
The research was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the National Natural Science Foundation of China (No. 61872256).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, J., Nguyen, P.A., Ma, Z., Ngo, CW. (2021). SQL-Like Interpretable Interactive Video Search. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-67835-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)